csp2: CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/docs/dnadiff.README comparison

comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/docs/dnadiff.README @ 69:33d812a61356

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d

author	jpayne
date	Tue, 18 Mar 2025 17:55:14 -0400
parents
children

comparison

equal deleted inserted replaced

-:0e9998148a16
+:33d812a61356
+--------------------------------------------------------------------------------
+dnadiff is a wrapper for nucmer and analysis utilities that provides
+detailed information on the differences between two genomes, and also
+provides a high level report file that quantifies the differences
+between the two inputs.
+Use Cases:
++ diff'ing two strains of the same species
++ diff'ing two assemblies of the same organism
++ diff'ing a draft assembly and a closely related finished genome
+If any of this code is used in any publication, please cite the following:
+Versatile and open software for comparing large genomes.
+S. Kurtz, A. Phillippy, A.L. Delcher,
+M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.
+Genome Biology (2004), 5:R12.
+--------------------------------------------------------------------------------
+This manual is also available as HTML documentation included in this
+distribution, or at:
+http://mummer.sourceforge.net
+http://mummer.sourceforge.net/manual
+http://mummer.sourceforge.net/examples
+-- DESCRIPTION --
+dnadiff is a wrapper around nucmer that builds an alignment using
+default parameters, and runs many of nucmer's helper scripts to
+process the output and report alignment statistics, SNPs, breakpoints,
+etc. It is designed for evaluating the sequence and structural
+similarity of two highly similar sequence sets. E.g. comparing two
+different assemblies of the same organism, or comparing two strains of
+the same species.
+-- dnadiff EXAMPLE --
+To compare two strains of the same species, type:
+"dnadiff genome1.fna genome2.fna"
+Output will be...
+out.report  - Summary of alignments, differences and SNPs
+out.delta   - Standard nucmer alignment output
+out.1delta  - 1-to-1 alignment from delta-filter -1
+out.mdelta  - M-to-M alignment from delta-filter -m
+out.1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
+out.mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
+out.snps    - SNPs from show-snps -rlTHC .1delta
+out.rdiff   - Classified ref breakpoints from show-diff -rH .mdelta
+out.qdiff   - Classified qry breakpoints from show-diff -qH .mdelta
+out.unref   - Unaligned reference sequence IDs and lengths
+out.unqry   - Unaligned query sequence IDs and lengths
+For more information on the formats and meanings of all the files
+produced, please see the documentation for the corresponding
+utility. This document serves to describe running the dnadiff script
+and interpreting the produced .report file.
+-- RUNNING 'dnadiff' --
+USAGE: dnadiff  [options]  <Reference>  <Query>
+or   dnadiff  [options]  -d <Delta File>
+DESCRIPTION:
+Run comparative analysis of two sequence sets using nucmer and its
+associated utilities with recommended parameters. See MUMmer
+documentation for a more detailed description of the
+output. Produces the following output files:
+.delta   - Standard nucmer alignment output
+.1delta  - 1-to-1 alignment from delta-filter -1
+.mdelta  - M-to-M alignment from delta-filter -m
+.1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
+.mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
+.snps    - SNPs from show-snps -rlTHC .1delta
+.rdiff   - Classified alignment breakpoints from show-diff -rH .mdelta
+.qdiff   - Classified alignment breakpoints from show-diff -qH .mdelta
+.report  - Summary of alignments, differences and SNPs
+.unref   - Unaligned reference sequence IDs and lengths
+.unqry   - Unaligned query sequence IDs and lengths
+MANDATORY:
+Reference       Set the input reference multi-FASTA filename
+Query           Set the input query multi-FASTA filename
+or
+Delta File      Unfiltered .delta alignment file from nucmer
+OPTIONS:
+-d|delta        Provide precomputed delta file for analysis
+-h
+--help          Display help information and exit
+-p|prefix       Set the prefix of the output files (default "out")
+-V
+--version       Display the version information and exit
+-- NOTES --
+The -p option is recommended to avoid overwriting previous
+output. A simple naming convention is for files A.fna and B.fna, to
+set "-p A_B". It is safest to let dnadiff run nucmer automatically, so
+avoid using the -d option unless the delta file was already generated
+with "nucmer --maxmatch" and has not been filtered.
+-- OUTPUT FILES --
+dnadiff produces many outputs, however all but one are produced by
+other utilities in the MUMmer package. Please see their corresponding
+documentation for more information. This section will only describe
+the .report file generated by dnadiff and tips on interpreting it.
+*** .report OUTPUT ***
+Report statistics are broken into two columns - reference and
+query. Rows are grouped by themed alignment metrics and are described
+here. Summary counts are estimates and do not represent the exact
+number of occurrences of a particular evolutionary event. When reading
+a reference column, think number of XYZ in reference with regard to
+the query. When reading a query column, think number of XYZ in query
+with regard to the reference.
+[Sequences]    - Sequence-centric stats.
+TotalSeqs      - Total number of input sequences.
+AlignedSeqs    - Number of input sequences with at least one alignment.
+UnalignedSeqs  - Number of input sequences with no alignment.
+[Bases]        - Base-pair-centric stats.
+TotalBases     - Total number of bases in the input sequences.
+AlignedBases   - Total number of bases contained within an alignment.
+UnalignedBases - Total number of unaligned bases. This is a rough
+measure for the amount of "unique" sequence in the
+reference and query.
+[Alignments]   - Alignment-centric stats.
+1-to-1         - Number of alignment blocks comprising the 1-to-1
+mapping of reference to query. This is a subset of
+the M-to-M mapping, with repeats removed.
+TotalLength    - Total length of 1-to-1 alignment blocks.
+AvgLength      - Average length of 1-to-1 alignment blocks.
+AvgIdentity    - Average identity of 1-to-1 alignment blocks.
+M-to-M         - Number of alignment blocks comprising the
+many-to-many mapping of reference to query. The
+M-to-M mapping represents the smallest set of
+alignments that maximize the coverage of both
+reference and query. This is a superset of the 1-to-1
+mapping.
+TotalLength    - Total length of M-to-M alignment blocks.
+AvgLength      - Average length of M-to-M alignment blocks.
+AvgIdentity    - Average identity of M-to-M alignment blocks.
+[Features]     - Structural alignment features, such as
+rearrangements. These counts are rough estimates
+based on an automated analysis of the
+alignments. Features are identified by scanning the
+reference (or query) from low to high, and noting the
+positions where the query alignments are
+inconsistently ordered or oriented with respect to
+the reference.
+Breakpoints    - Number of non-maximal alignment endpoints,
+i.e. endpoints that do not occur at the beginning or
+end of a sequence.
+Relocations    - Number of breaks in the alignment where adjacent
+1-to-1 alignment blocks are in the same sequence, but
+not consistently ordered. A separate feature is
+recorded for each end of a relocation, so this is
+really a count of relocation endpoints.
+Translocations - Number of breaks in the alignment where adjacent
+1-to-1 alignment blocks are in different sequences. A
+separate feature is recorded for each end of a
+translocation, so this is really a count of
+translocation endpoints.
+Inversions     - Number of breaks in the alignment where adjacent
+1-to-1 alignment blocks are inverted with respect to
+one another. A separate feature is recorded for each
+end of an inversion, so this is really a count of
+inversion endpoints.
+Insertions     - Rough count of insertion events. Note that this is
+slightly different from "UnalignedBases" because it
+counts duplications as insertions, whereas
+UnalignedBases does not. Also, this count does not
+included sequences that have no alignments as
+insertions, whereas UnalignedBases does. Note than
+insertions in R can be viewed as deletions from Q.
+This number reports only "major" insertions defined
+as insertions large enough to break an alignment.
+Nucmer will align through smaller insertions of less
+than ~60 bases. These smaller insertions are
+reported in the "Indels" count below.
+InsertionSum   - Rough sum of inserted sequence.
+InsertionAvg   - Average length of insertion.
+TandemIns      - Rough count of tandem duplication insertion
+events. Note that expansions in R can be viewed as
+collapses in Q.
+TandemInsSum   - Rough sum of tandem duplication insertions.
+TandemInsAvg   - Average length of tandem duplications.
+[SNPs]         - Single Nucleotide Polymorphism counts.
+TotalSNPs      - Total number of SNPs, same for both sequences.
+XY             - X-to-Y SNP. For reference column, this means
+reference 'X' to query 'Y'. For query column, this
+means query 'X' to reference 'Y'. The same
+convention applies below.
+TotalGSNPs     - Single Nucleotide Polymorphisms bounded by 20 exact,
+base-pair matches on both sides.
+TotalIndels    - Single Nucleotide Insertions/Deleltions.
+X.             - X insertion. For reference column, 'X.' means
+insertion of 'X' in the reference. For query column,
+'X.' means insertion of 'X' in the query. Nucmer will
+align through group insertions of up to ~60 bases.
+Each base of these group insertions will be reported
+in this count. Large insertions will be reported in
+the "Insertions" count about.
+TotalGIndels   - Single Nucleotide Insertions/Deleltions bounded by 20
+exact, base-pair matches on both sides.

Mercurial > repos > rliterman > csp2

comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/docs/dnadiff.README @ 69:33d812a61356