Mercurial > repos > rliterman > csp2

diff CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/README @ 69:33d812a61356
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author: jpayne
date: Tue, 18 Mar 2025 17:55:14 -0400
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/README	Tue Mar 18 17:55:14 2025 -0400
@@ -0,0 +1,935 @@
+-=- MUMmer3.x README -=-
+
+** NOTE **
+A comprehensive HTML user manual is available in the docs/web/manual
+subdirectory or at http://mummer.sourceforge.net/manual
+
+MUMmer is now an open source package!  Please contact us if you would like
+to contribute to the MUMmer project.  For more information or the latest
+release please visit the MUMmer homepage at http://mummer.sourceforge.net
+
+Please refer to the INSTALL file for installation instructions.  This file
+contains brief descriptions of all executables in the base directory and
+general information about the MUMmer package.
+
+
+
+-- DESCRIPTION --
+   MUMmer is a system for rapidly aligning entire genomes.  The current
+version (release 3.0) can find all 20 base pair maximal exact matches between
+two bacterial genomes of ~5 million base pairs each in 20 seconds, using 90 MB
+of memory, on a typical 1.8 GHz Linux desktop computer.  MUMmer can also align
+incomplete genomes; it handles the 100s or 1000s of contigs from a shotgun
+sequencing project with ease, and will align them to another set of contigs or
+a genome, using the nucmer utility included with the system.  The promer
+utility takes this a step further by generating alignments based upon the
+six-frame translations of both input sequences.  promer permits the alignment
+of genomes for which the proteins are similar but the DNA sequence is too
+divergent to detect similarity.  See the nucmer and promer readme files in the
+"docs/" subdirectory for more details.  MUMmer is open source, so all we ask
+is that you cite our most recent paper in any publications that use this
+system:
+
+        (Version 3.0 described)
+  Versatile and open software for comparing large genomes.
+  S. Kurtz, A. Phillippy, A.L. Delcher,
+  M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.
+  Genome Biology (2004), 5:R12.
+
+        (Version 2.1 described)
+  Fast algorithms for large-scale genome alignment and comparison.
+  A.L. Delcher. A. Phillippy, J. Carlton, and S.L. Salzberg.
+  Nucleic Acids Research 30:11 (2002), 2478-2483.
+
+        (Version 1.0 described)
+  Alignment of Whole Genomes.
+  A.L. Delcher, S. Kasif,
+  R.D. Fleischmann, J. Peterson, O. White, and S.L. Salzberg.
+  Nucleic Acids Research, 27:11 (1999), 2369-2376.
+
+
+-- RUNNING MUMmer3.0 --
+   MUMmer3.0 is comprised of many various utilities and scripts.  For general
+purposes, the scripts "run-mummer1", "run-mummer3", "nucmer", and "promer"
+will be all that is needed.  See their descriptions in the "RUNNING THE MUMmer
+SCRIPTS" section, or refer to their individual documentation in the "docs/"
+subdirectory.  Refer to the "RUNNING THE MUMmer UTILITIES" section for a brief
+description of all of the utilities in this directory.
+
+Simple use case:
+   Given a file containing a single reference sequence (ref.seq) in
+FASTA format and another file containing multiple sequences in FastA
+format (qry.seq) type the following at the command line:
+
+   './nucmer  -p <prefix>  ref.seq  qry.seq'
+
+   To produce the following files:
+        <prefix>.delta
+
+or
+
+   './run-mummer3.csh  ref.seq  qry.seq  <prefix>'
+
+   To produce the following files:
+        <prefix>.out
+        <prefix>.gaps
+        <prefix>.align
+        <prefix>.errorsgaps
+
+   Please read the utility-specific documentation in the "docs/" subdirectory
+for descriptions of these files and information on how to change the
+alignment parameters for the scripts (minimum match length, etc.), or see
+the notes below in the "RUNNING THE MUMmer SCRIPTS" section for a brief
+explanation.
+
+   To see a simple gnuplot output, if you have gnuplot installed, run
+the perl script 'mummerplot' on the output files. This script can be run
+on mummer output (.out), or nucmer/promer output (.delta). Edit the
+<prefix>.gp file that is created to change colors, line thicknesses, etc. or
+explore the <prefix>.[fr]plot file to see the data collection.
+
+   './mummerplot  -p <prefix>  <prefix>.out'
+
+   Or you can use the web viewer for completed microbial genomes:
+http://www.tigr.org/CMR
+
+
+
+-- RUNNING THE MUMmer SCRIPTS --
+   Because of MUMmer's modular design, it may be necessary to use a number
+of separate programs to produce the desired output.  The MUMmer scripts
+attempt to simplify this process by wrapping various utilities into packages
+that can perform standard alignment requests.  Listed below are brief
+descriptions and usage definitions for these scripts.  Please refer to the
+"docs/" subdirectory for a more detailed description of each script.
+
+
+   ** nucmer **
+
+        DESCRIPTION:
+        nucmer is for the all-vs-all comparison of nucleotide sequences
+        contained in multi-FastA data files.  It is best used for highly
+        similar sequence that may have large rearrangements.  Common use
+        cases are: comparing two unfinished shotgun sequencing assemblies,
+        mapping an unfinished sequencing assembly to a finished genome, and
+        comparing two fairly similar genomes that may have large
+        rearrangements and duplications.  Please refer to "docs/nucmer.README"
+        for more information regarding this script and its output, or type
+        'nucmer -h' for a list of its options.
+
+        USAGE:
+        nucmer  [options]  <reference>  <query>
+
+        [options]    type 'nucmer -h' for a list of options.
+        <reference>  specifies the multi-FastA sequence file that contains
+                     the reference sequences, to be aligned with the queries.
+        <query>      specifies the multi-FastA sequence file that contains
+                     the query sequences, to be aligned with the references.
+
+        OUTPUT:
+        out.delta    the delta encoded alignments between the reference and
+                     query sequences.  This file can be parsed with any of
+                     the show-* programs which are described in the "RUNNING
+                     THE MUMmer UTILITIES" section.
+
+        NOTES:
+        All output coordinates reference the forward strand of the involved
+        sequence, regardless of the match direction. Also, nucmer now uses
+        only matches that are unique in the reference sequence by default,
+        use the '--mum' or '--maxmatch' options to change this behavior.
+
+
+   ** promer **
+
+        DESCRIPTION:
+        promer is for the protein level, all-vs-all comparison of nucleotide
+        sequences contained in multi-FastA data files.  The nucleotide input
+        files are translated in all 6 reading frames and then aligned to one
+        another via the same methods as nucmer.  It is best used for highly
+        divergent sequences that may have moderate to high similarity on the
+        protein level.  Common use cases are: identifying syntenic regions
+        between highly divergent genomes, comparative genome annotation i.e.
+        using an already annotated genome to help in the annotation of a
+        newly sequenced genome, and the general comparison of two fairly
+        divergent genomes that have large rearrangements and may only be
+        similar on the protein level. Please refer to "docs/promer.README"
+        for more information regarding this script and its output, or type
+        'promer -h' for a list of its options.
+
+        USAGE:
+        promer  [options]  <reference>  <query>
+
+        [options]    type 'promer -h' for a list of options.
+        <reference>  specifies the multi-FastA sequence file that contains
+                     the reference sequences, to be aligned with the queries.
+        <query>      specifies the multi-FastA sequence file that contains
+                     the query sequences, to be aligned with the references.
+
+        OUTPUT:
+        out.delta    the delta encoded alignments between the reference and
+                     query sequences.  This file can be parsed with any of
+                     the show-* programs which are described in the "RUNNING
+                     THE MUMmer UTILITIES" section.
+
+        NOTES:
+        All output coordinates reference the forward strand of the involved
+        sequence, regardless of the match direction, and are measured in
+        nucleotides with the exception of the delta integers which are
+        measured in amino acids (1 delta int = 3 nucleotides). Also, promer
+        now uses only matches that are unique in the reference sequence by
+        default, use the '--mum' or '--maxmatch' options to change this
+        behavior.
+
+
+   ** run-mummer1 **
+
+        DESCRIPTION:
+        This script is taken directly from MUMmer1.0 and is best used to
+        align two sequences in which there is high similarity and no re-
+        arrangements.  Common use cases are: aligning two finished bacterial
+        chromosomes.  Please refer to "docs/run-mummer1.README" for the
+        original documentation for this script and its output.
+
+        USAGE:
+        run-mummer1  <seq1>  <seq2>  <tag>  [-r]
+
+        <seq1>  specifies the file with the first sequence in FastA format.
+                No more than one sequence is allowed.
+        <seq2>  specifies the file with the second sequence in FastA format.
+                No more than one sequence is allowed.
+        <tag>   specifies the prefix to be used for the output files.
+        [-r]    is an optional parameter that will reverse complement the
+                second sequence.
+
+        OUTPUT:
+        out.align       the out.gaps file interspersed with the alignments
+                        of the gaps.
+        out.errorsgaps  the out.gaps file with an extra column stating the
+                        number of errors contained in each gap.
+        out.gaps        an ordered (clustered) list of matches with position
+                        information, and gap distances between each match.
+        out.out         a list of all maximal unique matches between the two
+                        input sequences ordered by their start position in the
+                        second sequence.
+
+        NOTES:
+        All output coordinates reference their respective strand.  This means
+        that if the -r switch is active, coordinates that reference the
+        second sequence will be relative to the reverse complement of the
+        second sequence.  Please use nucmer or promer if this coordinate
+        system is confusing.
+            Eventually, this script's components will be rewritten to work
+        with the new MUMmer format standards and phased out in favor of the
+        new components and wrapping script.
+
+
+   ** run-mummer3 **
+
+        DESCRIPTION:
+        This script is the improved version of the MUMmer1.0 run-mummer1
+        script.  It uses a new clustering algorithm that appropriately
+        handles multiple sequence rearrangements and inversions.  Because
+        of this, it can handle more divergent sequences better than
+        run-mummer1.  In addition, it allows a multi-FastA query file for
+        1-vs-many sequence comparisons.  Please refer to
+        "docs/run-mummer3.README" for more detailed documentation of this
+        script and its output.
+
+        USAGE:
+        run-mummer3  <reference>  <query>  <prefix>
+
+        <reference>  specifies the file with the reference sequence in FastA
+                     format.  No more than one sequence is allowed.
+        <query>      specifies the multi-FastA sequence file that contains
+                     the query sequences.
+        <prefix>     specifies the file prefix for the output files.
+
+        OUTPUT:
+        out.align       the out.gaps file interspersed with the alignments
+                        of the gaps.
+        out.errorsgaps  the out.gaps file with an extra column stating the
+                        number of errors contained in each gap.
+        out.gaps        an ordered (clustered) list of matches with position
+                        information, and gap distances between each match.
+        out.out         a list of all maximal unique matches between the two
+                        input sequences ordered by their start position in the
+                        second sequence.
+
+        NOTES:
+        All output coordinates reference their respective strand.  This means
+        that for all reverse matches, the coordinates that reference the
+        query sequence will be relative to the reverse complement of the
+        query sequence.  Please use nucmer or promer if this coordinate
+        system is confusing.
+
+
+   ** dnadiff **
+
+        DESCRIPTION:
+        This script is a wrapper around nucmer that builds an
+        alignment using default parameters, and runs many of nucmer's
+        helper scripts to process the output and report alignment
+        statistics, SNPs, breakpoints, etc. It is designed for
+        evaluating the sequence and structural similarity of two
+        highly similar sequence sets. E.g. comparing two different
+        assemblies of the same organism, or comparing two strains of
+        the same species.  Please refer to "docs/dnadiff.README" for
+        more information regarding this script and its output, or type
+        'dnadiff -h' for a list of its options.
+
+        USAGE: dnadiff  [options]  <reference>  <query>
+          or   dnadiff  [options]  -d <delta file>
+
+        <reference>       Set the input reference multi-FASTA filename
+        <query>           Set the input query multi-FASTA filename
+           or
+        <delta file>      Unfiltered .delta alignment file from nucmer
+
+        OUTPUT:
+        .report  - Summary of alignments, differences and SNPs
+        .delta   - Standard nucmer alignment output
+        .1delta  - 1-to-1 alignment from delta-filter -1
+        .mdelta  - M-to-M alignment from delta-filter -m
+        .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
+        .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
+        .snps    - SNPs from show-snps -rlTHC .1delta
+        .rdiff   - Classified ref breakpoints from show-diff -rH .mdelta
+        .qdiff   - Classified qry breakpoints from show-diff -qH .mdelta
+        .unref   - Unaligned reference IDs and lengths (if applicable)
+        .unqry   - Unaligned query IDs and lengths (if applicable)
+
+        NOTES:
+        The report file generated by this script can be useful for
+        comparing the differences between two similar genomes or
+        assemblies. The other outputs generated by this script are in
+        unlabeled tabular format, so please refer to the utility
+        specific documentation for interpreting them. A full
+        description of the report file is given in "docs/dnadiff.README".
+
+
+-- RUNNING THE MUMmer UTILITIES --
+   The MUMmer package consists of various utilities that can interact with
+the 'mummer' program.  'mummer' performs all maximal and maximal unique
+matching, and all other utilities were designed to process the input and
+output of this program and its related scripts, in order to extract
+additional information from the output.  Listed below are the descriptions
+and usage definitions for these utilities.
+
+
+   ** annotate **
+
+        DESCRIPTION:
+        This program reads the output of the 'gaps' program and adds alignment
+        information to it.  Part of the original MUMmer1.0 pipeline and can
+        only be used on the output of the 'gaps' program.
+
+        USAGE:
+        annotate  <gapsfile>  <seq2>
+
+        <gapsfile>  the output of the 'gaps' program.
+        <seq2>      the file containing the second sequence in the comparison.
+
+        OUTPUT:
+        stdout           the 'gaps' output interspersed with the alignments of
+                         the gaps between adjacent MUMs.  An alignment of a
+                         gap comes after the second MUM defining the gap, and
+                         alignment errors are marked with a '^' character.
+        witherrors.gaps  the 'gaps' output with an appended column that lists
+                         the number of alignment errors for each gap.
+
+        NOTES:
+        This program will eventually be dropped in favor of the combineMUMs
+        or nucmer match extenders, but persists for the time being.
+
+
+   ** combineMUMs **
+
+        DESCRIPTION:
+        This program reads the output of the 'mgaps' program and adds alignment
+        information to it.  Part of the MUMmer3.0 pipeline and can only be
+        used on the output of the 'mgaps' program. This -D option alters this
+        behavior and only outputs the positions of difference, e.g. SNPs.
+
+        USAGE:
+        combineMUMs  [options]  <reference>  <query>  <mgapsfile>
+
+        [options]    type 'combineMUMs -h' for a list of options.
+        <reference>  the FastA reference file used in the comparison.
+        <query>      the multi-FastA reference file used in the comparison.
+        <mgapsfile>  the output of the 'mgaps' program run on the match
+                     list produced by 'mummer' for the reference and query
+                     files.
+
+        OUTPUT:
+        stdout           the 'mgaps' output interspersed with the alignments
+                         of the gaps between adjacent MUMs.  An alignment of a
+                         gap comes after the second MUM defining the gap, and
+                         alignment errors are marked with a '^' character.  At
+                         the end of each cluster is a summary line (keyword
+                         "Region") noting the bounds of the cluster in the
+                         reference and query sequences, the total number of
+                         errors for the region, the length of the region and
+                         the percent error of the region.
+        witherrors.gaps  the 'mgaps' output with an appended column that lists
+                         the number of alignment errors for each gap.
+
+
+   ** delta-filter **
+
+        DESCRIPTION:
+
+        This program filters a delta alignment file produced by either
+        nucmer or promer, leaving only the desired alignments which
+        are output to stdout in the same delta format as the
+        input. Its primary function is the LIS algorithm which
+        calculates the longest increasing subset of alignments. This
+        allows for the calculation of a global set of alignments
+        (i.e. 1-to-1 and mutually consistent order) with the -g option
+        or locally consistent with -1 or -m. Reference sequences can
+        be mapped to query sequences with -r, or queries to references
+        with -q. This allows the user to exclude chance and repeat
+        induced alignments, leaving only the "best" alignments between
+        the two data sets. Filtering can also be performed on length,
+        identity, and uniquenes.
+
+        USAGE:
+        delta-filter  [options]  <deltafile>
+
+        [options]    type 'delta-filter -h' for a list of options.
+        <deltafile>  the .delta output file from either nucmer or promer.
+
+        OUTPUT:
+        stdout  The same delta alignment format as output by nucmer and promer.
+
+        NOTES:
+        For most cases the -m option is recommended, however -1 is
+        useful for applications that require a 1-to-1 mapping, such as
+        SNP finding. Use the -q option for mapping query contigs to
+        their best reference location.
+
+
+   ** exact-tandems **
+
+        DESCRIPTION:
+        This script finds exact tandem repeats in a specified FastA sequence
+        file.  It is a post-processor for 'repeat-match' and provides a simple
+        interface and output for tandem repeat detection.
+
+        USAGE:
+        exact-tandems  <file>  <min match>
+
+        <file>       the single sequence in FastA format to search for repeats.
+        <min match>  the minimum match length for the tandems.
+
+        OUTPUT:
+        stdout  4 columns, the start of the tandem repeat, the total extent
+                of the repeat region, the length of each repetitive unit, and
+                to total copies of the repetitive unit involved.
+
+
+   ** gaps **
+
+        DESCRIPTION:
+        This program reads a list of unique matches between two strings and
+        outputs the longest consistent set of matches, followed by all the
+        other matches.  Part of the MUMmer1.0 pipeline and the output of the
+        'mummer' program needs to be processed (to strip all non-match lines)
+        before it can be passed to this program.
+
+        USAGE:
+        gaps  <seq1>  [-r]  <  <matchlist>
+
+        <seq1>       The first sequence file that the match list represents.
+        <matchlist>  A simple list of matches and NO header lines or other
+                     mumbo jumbo.  The columns of the match list should be
+                     start in the reference, start in the query, and length
+                     of the match.
+        [-r]         Simply puts the string "reverse" on the header of the
+                     output so 'annotate' knows to reverse the second
+                     sequence.
+
+        OUTPUT:
+        stdout  an ordered set of the input matches, separated by headers.
+                The first set is the longest consistent set of matches and
+                the second set is all other matches.
+
+        NOTES:
+        This program will eventually be rewritten to be interchangeable with
+        'mgaps', so that it may be plugged into the nucmer or promer
+        pipelines.
+
+
+   ** mapview **
+
+        DESCRIPTION:
+        mapview is a utility program for displaying sequence alignments as
+        provided by MUMmer, nucmer or promer. This program takes the output
+        from these alignment routines and converts it to a FIG, PDF or PS
+        file for visual analysis. It can also break the output into multiple
+        files for easier viewing and printing. Please refer to
+        "docs/mapview.README" for a more detailed description and explination.
+
+        USAGE:
+        mapview  [options]  <coords file>  [UTR coords]  [CDS coords]
+
+        [options]       type 'mapview -h' for a list of options.
+        <coords file>   show-coords output file
+        [UTR coords]    UTR coordinate file in GFF format
+        [CDS coords]    CDS coordinate file in GFF format
+
+        OUTPUT:
+        Default output format is an xfig file, however this can be changed to
+        a postscript of PDF file with the -f option. See 'mapview -h' for a
+        list of available formatting options.
+
+        NOTES:
+        The produce the coords file input, 'show-coords' must be run with the
+        -r -l options. To reduce redundant matches in promer output, run
+        show-coords with the -k option. To generate output formats other than
+        xfig, the fig2dev utility must be available from the system path. For
+        very large reference genomes, FIG format may be the only option that
+        will allow the entire display to be stored in one file, as fig2dev has
+        problems if the output is too large.
+
+
+   ** mgaps **
+
+        DESCRIPTION:
+        This program reads a list of matches between a single-FastA reference
+        and a multi-FastA query file and outputs clusters of matches that lie
+        on similar diagonals and within a reasonable distance.  Part of the
+        MUMmer3.0 pipeline and the output of 'mummer' need not be processed
+        before passing it to this program, so long as 'mummer' was run on a
+        1-vs-many or 1-vs-1 dataset.
+
+        USAGE:
+        mgaps  [options]  <  <matchlist>
+
+        [options]    type 'mgaps -h' for a list of options.
+        <matchlist>  A list of matches separated by their sequence FastA tags.
+                     The columns of the match list should be start in
+                     reference, start in query, and length of the match.
+
+        OUTPUT:
+        stdout  An ordered set of the input matches, separated by headers.
+                Individual clusters are separated by a '#' character and
+                sets of clusters from different sequences are separated by
+                the FastA header tag for the query sequence.
+
+        NOTES:
+        It is often very helpful to adjust the clustering parameters.  Check
+        'mgaps -h' for the list of parameters and check the source for a
+        better idea of how each parameter affects the result.  Often, it is
+        helpful to run this program a number of times with different
+        parameters until the desired result is achieved.
+
+
+   ** mummer **
+
+        DESCRIPTION:
+        This is the core program of the MUMmer package.  It is the suffix-tree
+        based match finding routine, and the main part of every MUMmer script.
+        For a detailed manual describing how to use this program, please refer
+        to "docs/maxmat3man.pdf" or in LaTeX format "docs/maxmat3man.tex". By
+        default, 'mummer' now finds maximal matches regardless of their
+        uniqueness. Limiting the output to only unique matches can be specified
+        as a command line switch.
+
+        USAGE:
+        mummer  [options]  <reference>  <query> ...
+
+        [options]    type 'mummer -help' for a list of options.
+        <reference>  specifies the single or multi-FastA sequence file that
+                     contains the reference sequence(s), to be aligned with
+                     the queries.
+        <query>      specifies the multi-FastA sequence file that contains
+                     the query sequences, to be aligned with the references.
+                     Multiple query files are allowed, up to 32.
+
+        OUTPUT:
+        stdout  a list of exact matches. Varies depending on input, refer to
+                the manual specified in the description above.
+
+        NOTES:
+        Many thanks to Stefan Kurtz for the latest mummer version. 'mummer'
+        now behaves like the old 'mummer2' program by default. The -mum switch
+        forces it to behave like 'mummer1', the -mumreference switch forces it
+        to behave like 'mummer2' while the -maxmatch switch forces it to behave
+        like the old 'max-match' program.
+
+
+   ** mummerplot **
+
+        DESCRIPTION:
+        mummerplot is a perl script that generates gnuplot scripts and data
+        collections for plotting with the gnuplot utility.  It can generate
+        2-d dotplots and 1-d coverage plots for the output of mummer, nucmer,
+        promer or show-tiling. It can also color dotplots with an identity
+        color gradient.
+
+        USAGE:
+        mummerplot  [options]  <matchfile>
+
+        [options]    type 'mummerplot -h' for a list of options.
+        <matchfile>  the output of 'mummer', 'nucmer', 'promer', or
+                     'show-tiling'. 'mummerplot' will automatically determine
+                     the format of the data it was given and produce the plot
+                     accordingly.
+
+        OUTPUT:
+        out.gp     The gnuplot script, type 'gnuplot out.gp' to evaluate the
+                   the gnuplot script.
+        out.fplot
+        out.rplot
+        out.hplot  The forward, reverse and highlighted match information for
+                   plotting with gnuplot.
+
+        out.ps
+        out.png    The plotted image file, postscript or png depending on the
+                   selected terminal type.
+
+        NOTES:
+        For alignments with multiple reference or query sequences, be sure to
+        use the -r -q or -R -Q options to avoid overlaying multiple plots in
+        the same space. For better looking color gradient plots, try the
+        postscript terminal and avoid the png terminal.
+
+
+   ** nucmer2xfig **
+
+        DESCRIPTION:
+        Script for plotting nucmer hits against a reference sequence. See top
+        of script for more information, or see if 'mummerplot' or 'mapview'
+        has the functionality required as they are properly maintained.
+
+
+   ** repeat-match **
+
+        DESCRIPTION:
+        Finds exact repeats within a single sequence.
+
+        USAGE:
+        repeat-match  [options]  <seq>
+
+        [options]  type 'repeat-match -h' for a list of options.
+        <seq>      the single sequence in FastA format to search for repeats.
+
+        OUTPUT:
+        stdout  3 columns, the start of the first copy of the repeat, the
+                start of the second copy of the repeat, and the length of the
+                repeat respectively.
+
+        NOTES:
+        REPuter (freely available for universities) may be better suited for
+        most repeat matching, but 'repeat-match' is open-source and has some
+        functionality that REPuter does not so we include it along with the
+        MUMmer package.
+
+
+   ** show-aligns **
+
+        DESCRIPTION:
+        This program parses the delta alignment output of nucmer and promer
+        and displays all of the pairwise alignments from the two sequences
+        specified on the command line.
+
+        USAGE:
+        show-aligns  [options]  <deltafile>  <IdR>  <IdQ>
+
+        [options]    type 'show-aligns -h' for a list of options.
+        <deltafile>  the .delta output file from either nucmer or promer.
+        <IdR>        the FastA header tag of the desired reference sequence.
+        <IdQ>        the FastA header tag of the desired query sequence.
+
+        OUTPUT:
+        stdout  each alignment header and footer describes the frame of the
+                alignment in each sequence, and the start and finish
+                (inclusive) of the alignment in each sequence.  At the
+                beginning of each line of aligned sequence are two numbers, the
+                top is the coordinate of the first reference base on that line
+                and the bottom is the coordinate of the first query base on
+                that line.  ALL coordinates reference the forward strand of the
+                DNA sequence, even if it is a protein alignment.  A gap caused
+                by an insertion or deletion is filled with a '.' character.
+                Errors in a DNA alignment are marked with a '^' below the
+                error.  Errors in an amino acid alignment are marked with a
+                whitespace in the middle consensus line, while matches are
+                marked with the consensus base and similarities are marked with
+                a '+' in the consensus line.
+
+
+   ** show-coords **
+
+        DESCRIPTION:
+        This program parses the delta alignment output of nucmer and promer
+        and displays the coordinates, and other useful information about the
+        alignments.
+
+        USAGE:
+        show-coords  [options]  <deltafile>
+
+        [options]    type 'show-coords -h' for a list of options.
+        <deltafile>  the .delta output file from either nucmer or promer.
+
+        OUTPUT:
+        stdout  run 'show-coords' without the -H option to see the column
+                header tags.  Here is a description of each tag.  Note that
+                some of the below tags do not apply to nucmer data, and that
+                all coordinates are inclusive and relative to the forward DNA
+                strand.
+
+        [S1]    Start of the alignment region in the reference sequence.
+
+        [E1]    End of the alignment region in the reference sequence.
+
+        [S2]    Start of the alignment region in the query sequence.
+
+        [E2]    End of the alignment region in the query sequence.
+
+        [LEN 1] Length of the alignment region in the reference sequence,
+        measured in nucleotides.
+
+        [LEN 2] Length of the alignment region in the query sequence, measured
+        in nucleotides.
+
+        [% IDY] Percent identity of the alignment, calculated as the
+        (number of exact matches) / ([LEN 1] + insertions in the query).
+
+        [% SIM] Percent similarity of the alignment, calculated like the above
+        value, but counting positive BLOSUM matrix scores instead of exact
+        matches.
+
+        [% STP] Percent of stop codons of the alignment, calculated as
+        (number of stop codons) / (([LEN 1] + insertions in the query) * 2).
+
+        [LEN R] Length of the reference sequence.
+
+        [LEN Q] Length of the query sequence.
+
+        [COV R] Percent coverage of the alignment on the reference sequence,
+        calculated as [LEN 1] / [LEN R].
+
+        [COV Q] Percent coverage of the alignment on the query sequence,
+        calculated as [LEN 2] / [LEN Q].
+
+        [FRM]   Reading frame for the reference sequence and the reading frame
+        for the query sequence respectively.  This is one of the columns
+        absent from the nucmer data, however, match direction can easily be
+        determined by the start and end coordinates.
+
+        [TAGS]  The reference FastA ID and the query FastA ID.
+
+                There is also an optional final column (turned on with the -w
+        or -o option) that will contain some 'annotations'. The -o option will
+        annotate alignments that represent overlaps between two sequences,
+        while the -w option is antiquated and should no longer be used.
+        Sometimes, nucmer or promer will extend adjacent clusters past one
+        another, thus causing a somewhat redundant output, this option will
+        notify users of such rare occurrences.
+
+        NOTES:
+        The -c and -l options are useful when comparing two sets of assembly
+        contigs, in that these options help determine if an alignment spans an
+        entire contig, or is just a partial hit to a different read.  The -b
+        option is useful when the user wishes to identify sytenic regions
+        between two genomes, but is not particularly interested in the actual
+        alignment similarity or appearance.  This option also disregards match
+        orientation, so should not be used if this information is needed.
+
+
+   ** show-diff **
+
+        DESCRIPTION:
+        This program classifies alignment breakpoints for the
+        quantification of macroscopic differences between two
+        genomes. It takes a standard, unfiltered delta file as input,
+        determines the best mapping between the two sequence sets, and
+        reports on the breaks in that mapping.
+
+        USAGE:
+        show-diff  [options]  <deltafile>
+
+        [options]    type 'show-diff -h' for a list of options.
+        <deltafile>  the .delta output file from nucmer
+
+        OUTPUT:
+        stdout  Classified breakpoints are output one per line with
+                the following types and column definitions. The first
+                five columns of every row are seq ID, feature type,
+                feature start, feature end, and feature length.
+
+        Feature Columns
+
+        IDR GAP gap-start gap-end gap-length-R gap-length-Q gap-diff
+        IDR DUP dup-start dup-end dup-length
+        IDR BRK gap-start gap-end gap-length
+        IDR JMP gap-start gap-end gap-length
+        IDR INV gap-start gap-end gap-length
+        IDR SEQ gap-start gap-end gap-length prev-sequence next-sequence
+
+        Feature Types
+
+        [GAP] A gap between two mutually consistent ordered and
+        oriented alignments. gap-length-R is the length of the
+        alignment gap in the reference, gap-length-Q is the length of
+        the alignment gap in the query, and gap-diff is the difference
+        between the two gap lengths. If gap-diff is positive, sequence
+        has been inserted in the reference. If gap-diff is negative,
+        sequence has been deleted from the reference. If both
+        gap-length-R and gap-length-Q are negative, the indel is
+        tandem duplication copy difference.
+
+        [DUP] A duplicated sequence in the reference that occurs more
+        times in the reference than in the query. The coordinate
+        columns specify the bounds and length of the
+        duplication. These features are often bookended by BRK
+        features if there is unique sequence bounding the duplication.
+
+        [BRK] An insertion in the reference of unknown origin, that
+        indicates no query sequence aligns to the sequence bounded by
+        gap-start and gap-end. Often found around DUP elements or at
+        the beginning or end of sequences.
+
+        [JMP] A relocation event, where the consistent ordering of
+        alignments is disrupted. The coordinate columns specify the
+        breakpoints of the relocation in the reference, and the
+        gap-length between them. A negative gap-length indicates the
+        relocation occurred around a repetitive sequence, and a
+        positive length indicates unique sequence between the
+        alignments.
+
+        [INV] The same as a relocation event, however both the
+        ordering and orientation of the alignments is disrupted. Note
+        that for JMP and INV, generally two features will be output,
+        one for the beginning of the inverted region, and another for
+        the end of the inverted region.
+
+        [SEQ] A translocation event that requires jumping to a new
+        query sequence in order to continue aligning to the
+        reference. If each input sequence is a chromosome, these
+        features correspond to inter-chromosomal translocations.
+
+        NOTES:
+        The estimated number of features, take inversions for example,
+        represents the number of breakpoints classified as bordering
+        an inversion. Therefore, since there will be a breakpoint at
+        both the beginning and the end of an inversion, the feature
+        counts are roughly double the number of inversion events. In
+        addition, all counts are estimates and do not represent the
+        exact number of each evolutionary event.
+
+        Summing the fifth column (ignoring negative values) yeilds an
+        estimate of the total inserted sequence in the
+        reference. Summing the fifth column after removing DUP
+        features yields an estimate of the total amount of unique
+        (unaligned) sequence in the reference. Note that unaligned
+        sequences are not counted, and could represent additional
+        "unique" sequences. Use the 'dnadiff' script if you must
+        recover this information. Finally, the -q option switches
+        references for queries, and uses the query coordinates for the
+        analysis.
+
+
+   ** show-snps **
+
+        DESCRIPTION:
+        This program reports polymorphism contained in a delta encoded
+        alignment file output by either nucmer or promer. It catalogs
+        all of the single nucleotide polymorphisms (SNPs) and
+        insertions/deletions within the delta file
+        alignments. Polymorphisms are reported one per line, in a
+        delimited fashion similar to show-coords. Pairing this program
+        with the appropriate MUMmer tools can create an easy to use
+        SNP pipeline for the rapid identification of putative SNPs
+        between any two sequence sets.
+
+        USAGE:
+        show-snps  [options]  <deltafile>
+
+        [options]    type 'show-snps -h' for a list of options.
+        <deltafile>  the .delta output file from either nucmer or promer.
+
+        OUTPUT:
+        stdout  Standard output has column headers with the following
+                meanings. Not all columns will be output by default,
+                see 'show-snps -h' for switch to control the output.
+
+        [P1]    SNP position in the reference.
+
+        [SUB]   Character in the reference.
+
+        [SUB]   Character in the query.
+
+        [P2]    SNP position in the query.
+
+        [BUFF]  Distance from this SNP to the nearest mismatch (end of
+        alignment, indel, SNP, etc) in the same alignment.
+
+        [DIST]  Distance from this SNP to the nearest sequence end.
+
+        [R]     Number of repeat alignments which cover this reference
+        position, >0 means repetitive sequence.
+
+        [Q]     Number of repeat alignments which cover this query
+        position, >0 means repetitive sequence.
+
+        [LEN R] Length of the reference sequence.
+
+        [LEN Q] Length of the query sequence.
+
+        [CTX R] Surrounding context sequence in the reference.
+
+        [CTX Q] Surrounding context sequence in the query.
+
+        [FRM]   Reading frame for the reference sequence and the
+        reading frame for the query sequence respectively. Simply
+        'forward' 1, or 'reverse' -1 for nucmer data.
+
+        [TAGS]  The reference FastA ID and the query FastA ID.
+
+        NOTES:
+        It is often helpful to run this with the -C option to assure
+        reported SNPs are only reported from uniquely aligned regions.
+
+
+   ** show-tiling **
+
+        DESCRIPTION:
+        This program attempts to construct a tiling path out of the query
+        contigs as mapped to the reference sequences.  Given the delta
+        alignment information of a few long reference sequences and many small
+        query contigs, 'show-tiling' will determine the best location on a
+        reference for each contig.  Note that each contig may only be tiled
+        once, so repetitive regions may cause this program some difficulty.
+        This program is useful for aiding in the scaffolding and closure of an
+        unfinished set of contigs, if a suitable, high similarity, reference
+        genome is available.  Or, if using promer, 'show-tiling' will help
+        in the identification of syntenic regions and their contig's mapping
+        the the references.
+
+        USAGE:
+        show-tiling  [options]  <deltafile>
+
+        [options]    type 'show-tiling -h' for a list of options.
+        <deltafile>  the .delta output file from either nucmer or promer.
+
+        OUTPUT:
+        stdout  Standard output has 8 columns: start in reference, end in
+                reference, gap between this contig and the next, length of this
+                contig, alignment coverage of this contig, average percent
+                identity of the alignments for this contig, orientation of this
+                contig, contig ID. All matches to a reference are headed by the
+                FASTA tag of that reference.  Output with the -a option is the
+                same as 'show-coords -cl' when run on nucmer data.
+
+        NOTES:
+        When run with the -x option, 'show-tiling' will produce an XML output
+        format that can be accepted by TIGR's open source scaffolding software
+        'Bambus' as contig linking information.
+
+
+-- CONTACT INFORMATION --
+
+Please address questions and bug reports to: <mummer-help@lists.sourceforge.net>
+
+Last Revised May 12, 2005
author	jpayne
date	Tue, 18 Mar 2025 17:55:14 -0400
parents
children