csp2: CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/docs/dnadiff.README annotate

annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/docs/dnadiff.README @ 69:33d812a61356

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d

author	jpayne
date	Tue, 18 Mar 2025 17:55:14 -0400
parents
children

rev	line source
jpayne@69	1 --------------------------------------------------------------------------------
jpayne@69	2 dnadiff is a wrapper for nucmer and analysis utilities that provides
jpayne@69	3 detailed information on the differences between two genomes, and also
jpayne@69	4 provides a high level report file that quantifies the differences
jpayne@69	5 between the two inputs.
jpayne@69	6
jpayne@69	7 Use Cases:
jpayne@69	8 + diff'ing two strains of the same species
jpayne@69	9 + diff'ing two assemblies of the same organism
jpayne@69	10 + diff'ing a draft assembly and a closely related finished genome
jpayne@69	11
jpayne@69	12 If any of this code is used in any publication, please cite the following:
jpayne@69	13
jpayne@69	14 Versatile and open software for comparing large genomes.
jpayne@69	15 S. Kurtz, A. Phillippy, A.L. Delcher,
jpayne@69	16 M. Smoot, M. Shumway, C. Antonescu, and S.L. Salzberg.
jpayne@69	17 Genome Biology (2004), 5:R12.
jpayne@69	18
jpayne@69	19 --------------------------------------------------------------------------------
jpayne@69	20
jpayne@69	21 This manual is also available as HTML documentation included in this
jpayne@69	22 distribution, or at:
jpayne@69	23
jpayne@69	24 http://mummer.sourceforge.net
jpayne@69	25 http://mummer.sourceforge.net/manual
jpayne@69	26 http://mummer.sourceforge.net/examples
jpayne@69	27
jpayne@69	28
jpayne@69	29 -- DESCRIPTION --
jpayne@69	30 dnadiff is a wrapper around nucmer that builds an alignment using
jpayne@69	31 default parameters, and runs many of nucmer's helper scripts to
jpayne@69	32 process the output and report alignment statistics, SNPs, breakpoints,
jpayne@69	33 etc. It is designed for evaluating the sequence and structural
jpayne@69	34 similarity of two highly similar sequence sets. E.g. comparing two
jpayne@69	35 different assemblies of the same organism, or comparing two strains of
jpayne@69	36 the same species.
jpayne@69	37
jpayne@69	38
jpayne@69	39 -- dnadiff EXAMPLE --
jpayne@69	40 To compare two strains of the same species, type:
jpayne@69	41
jpayne@69	42 "dnadiff genome1.fna genome2.fna"
jpayne@69	43
jpayne@69	44 Output will be...
jpayne@69	45 out.report - Summary of alignments, differences and SNPs
jpayne@69	46 out.delta - Standard nucmer alignment output
jpayne@69	47 out.1delta - 1-to-1 alignment from delta-filter -1
jpayne@69	48 out.mdelta - M-to-M alignment from delta-filter -m
jpayne@69	49 out.1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
jpayne@69	50 out.mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
jpayne@69	51 out.snps - SNPs from show-snps -rlTHC .1delta
jpayne@69	52 out.rdiff - Classified ref breakpoints from show-diff -rH .mdelta
jpayne@69	53 out.qdiff - Classified qry breakpoints from show-diff -qH .mdelta
jpayne@69	54 out.unref - Unaligned reference sequence IDs and lengths
jpayne@69	55 out.unqry - Unaligned query sequence IDs and lengths
jpayne@69	56
jpayne@69	57 For more information on the formats and meanings of all the files
jpayne@69	58 produced, please see the documentation for the corresponding
jpayne@69	59 utility. This document serves to describe running the dnadiff script
jpayne@69	60 and interpreting the produced .report file.
jpayne@69	61
jpayne@69	62
jpayne@69	63 -- RUNNING 'dnadiff' --
jpayne@69	64
jpayne@69	65 USAGE: dnadiff [options] <Reference> <Query>
jpayne@69	66 or dnadiff [options] -d <Delta File>
jpayne@69	67
jpayne@69	68 DESCRIPTION:
jpayne@69	69 Run comparative analysis of two sequence sets using nucmer and its
jpayne@69	70 associated utilities with recommended parameters. See MUMmer
jpayne@69	71 documentation for a more detailed description of the
jpayne@69	72 output. Produces the following output files:
jpayne@69	73
jpayne@69	74 .delta - Standard nucmer alignment output
jpayne@69	75 .1delta - 1-to-1 alignment from delta-filter -1
jpayne@69	76 .mdelta - M-to-M alignment from delta-filter -m
jpayne@69	77 .1coords - 1-to-1 coordinates from show-coords -THrcl .1delta
jpayne@69	78 .mcoords - M-to-M coordinates from show-coords -THrcl .mdelta
jpayne@69	79 .snps - SNPs from show-snps -rlTHC .1delta
jpayne@69	80 .rdiff - Classified alignment breakpoints from show-diff -rH .mdelta
jpayne@69	81 .qdiff - Classified alignment breakpoints from show-diff -qH .mdelta
jpayne@69	82 .report - Summary of alignments, differences and SNPs
jpayne@69	83 .unref - Unaligned reference sequence IDs and lengths
jpayne@69	84 .unqry - Unaligned query sequence IDs and lengths
jpayne@69	85
jpayne@69	86 MANDATORY:
jpayne@69	87 Reference Set the input reference multi-FASTA filename
jpayne@69	88 Query Set the input query multi-FASTA filename
jpayne@69	89 or
jpayne@69	90 Delta File Unfiltered .delta alignment file from nucmer
jpayne@69	91
jpayne@69	92 OPTIONS:
jpayne@69	93 -d\|delta Provide precomputed delta file for analysis
jpayne@69	94 -h
jpayne@69	95 --help Display help information and exit
jpayne@69	96 -p\|prefix Set the prefix of the output files (default "out")
jpayne@69	97 -V
jpayne@69	98 --version Display the version information and exit
jpayne@69	99
jpayne@69	100
jpayne@69	101 -- NOTES --
jpayne@69	102 The -p option is recommended to avoid overwriting previous
jpayne@69	103 output. A simple naming convention is for files A.fna and B.fna, to
jpayne@69	104 set "-p A_B". It is safest to let dnadiff run nucmer automatically, so
jpayne@69	105 avoid using the -d option unless the delta file was already generated
jpayne@69	106 with "nucmer --maxmatch" and has not been filtered.
jpayne@69	107
jpayne@69	108
jpayne@69	109 -- OUTPUT FILES --
jpayne@69	110 dnadiff produces many outputs, however all but one are produced by
jpayne@69	111 other utilities in the MUMmer package. Please see their corresponding
jpayne@69	112 documentation for more information. This section will only describe
jpayne@69	113 the .report file generated by dnadiff and tips on interpreting it.
jpayne@69	114
jpayne@69	115
jpayne@69	116 * .report OUTPUT *
jpayne@69	117
jpayne@69	118 Report statistics are broken into two columns - reference and
jpayne@69	119 query. Rows are grouped by themed alignment metrics and are described
jpayne@69	120 here. Summary counts are estimates and do not represent the exact
jpayne@69	121 number of occurrences of a particular evolutionary event. When reading
jpayne@69	122 a reference column, think number of XYZ in reference with regard to
jpayne@69	123 the query. When reading a query column, think number of XYZ in query
jpayne@69	124 with regard to the reference.
jpayne@69	125
jpayne@69	126 [Sequences] - Sequence-centric stats.
jpayne@69	127 TotalSeqs - Total number of input sequences.
jpayne@69	128 AlignedSeqs - Number of input sequences with at least one alignment.
jpayne@69	129 UnalignedSeqs - Number of input sequences with no alignment.
jpayne@69	130
jpayne@69	131 [Bases] - Base-pair-centric stats.
jpayne@69	132 TotalBases - Total number of bases in the input sequences.
jpayne@69	133 AlignedBases - Total number of bases contained within an alignment.
jpayne@69	134 UnalignedBases - Total number of unaligned bases. This is a rough
jpayne@69	135 measure for the amount of "unique" sequence in the
jpayne@69	136 reference and query.
jpayne@69	137
jpayne@69	138 [Alignments] - Alignment-centric stats.
jpayne@69	139 1-to-1 - Number of alignment blocks comprising the 1-to-1
jpayne@69	140 mapping of reference to query. This is a subset of
jpayne@69	141 the M-to-M mapping, with repeats removed.
jpayne@69	142 TotalLength - Total length of 1-to-1 alignment blocks.
jpayne@69	143 AvgLength - Average length of 1-to-1 alignment blocks.
jpayne@69	144 AvgIdentity - Average identity of 1-to-1 alignment blocks.
jpayne@69	145
jpayne@69	146 M-to-M - Number of alignment blocks comprising the
jpayne@69	147 many-to-many mapping of reference to query. The
jpayne@69	148 M-to-M mapping represents the smallest set of
jpayne@69	149 alignments that maximize the coverage of both
jpayne@69	150 reference and query. This is a superset of the 1-to-1
jpayne@69	151 mapping.
jpayne@69	152 TotalLength - Total length of M-to-M alignment blocks.
jpayne@69	153 AvgLength - Average length of M-to-M alignment blocks.
jpayne@69	154 AvgIdentity - Average identity of M-to-M alignment blocks.
jpayne@69	155
jpayne@69	156 [Features] - Structural alignment features, such as
jpayne@69	157 rearrangements. These counts are rough estimates
jpayne@69	158 based on an automated analysis of the
jpayne@69	159 alignments. Features are identified by scanning the
jpayne@69	160 reference (or query) from low to high, and noting the
jpayne@69	161 positions where the query alignments are
jpayne@69	162 inconsistently ordered or oriented with respect to
jpayne@69	163 the reference.
jpayne@69	164 Breakpoints - Number of non-maximal alignment endpoints,
jpayne@69	165 i.e. endpoints that do not occur at the beginning or
jpayne@69	166 end of a sequence.
jpayne@69	167 Relocations - Number of breaks in the alignment where adjacent
jpayne@69	168 1-to-1 alignment blocks are in the same sequence, but
jpayne@69	169 not consistently ordered. A separate feature is
jpayne@69	170 recorded for each end of a relocation, so this is
jpayne@69	171 really a count of relocation endpoints.
jpayne@69	172 Translocations - Number of breaks in the alignment where adjacent
jpayne@69	173 1-to-1 alignment blocks are in different sequences. A
jpayne@69	174 separate feature is recorded for each end of a
jpayne@69	175 translocation, so this is really a count of
jpayne@69	176 translocation endpoints.
jpayne@69	177 Inversions - Number of breaks in the alignment where adjacent
jpayne@69	178 1-to-1 alignment blocks are inverted with respect to
jpayne@69	179 one another. A separate feature is recorded for each
jpayne@69	180 end of an inversion, so this is really a count of
jpayne@69	181 inversion endpoints.
jpayne@69	182
jpayne@69	183 Insertions - Rough count of insertion events. Note that this is
jpayne@69	184 slightly different from "UnalignedBases" because it
jpayne@69	185 counts duplications as insertions, whereas
jpayne@69	186 UnalignedBases does not. Also, this count does not
jpayne@69	187 included sequences that have no alignments as
jpayne@69	188 insertions, whereas UnalignedBases does. Note than
jpayne@69	189 insertions in R can be viewed as deletions from Q.
jpayne@69	190 This number reports only "major" insertions defined
jpayne@69	191 as insertions large enough to break an alignment.
jpayne@69	192 Nucmer will align through smaller insertions of less
jpayne@69	193 than ~60 bases. These smaller insertions are
jpayne@69	194 reported in the "Indels" count below.
jpayne@69	195 InsertionSum - Rough sum of inserted sequence.
jpayne@69	196 InsertionAvg - Average length of insertion.
jpayne@69	197
jpayne@69	198 TandemIns - Rough count of tandem duplication insertion
jpayne@69	199 events. Note that expansions in R can be viewed as
jpayne@69	200 collapses in Q.
jpayne@69	201 TandemInsSum - Rough sum of tandem duplication insertions.
jpayne@69	202 TandemInsAvg - Average length of tandem duplications.
jpayne@69	203
jpayne@69	204 [SNPs] - Single Nucleotide Polymorphism counts.
jpayne@69	205 TotalSNPs - Total number of SNPs, same for both sequences.
jpayne@69	206 XY - X-to-Y SNP. For reference column, this means
jpayne@69	207 reference 'X' to query 'Y'. For query column, this
jpayne@69	208 means query 'X' to reference 'Y'. The same
jpayne@69	209 convention applies below.
jpayne@69	210
jpayne@69	211 TotalGSNPs - Single Nucleotide Polymorphisms bounded by 20 exact,
jpayne@69	212 base-pair matches on both sides.
jpayne@69	213
jpayne@69	214 TotalIndels - Single Nucleotide Insertions/Deleltions.
jpayne@69	215 X. - X insertion. For reference column, 'X.' means
jpayne@69	216 insertion of 'X' in the reference. For query column,
jpayne@69	217 'X.' means insertion of 'X' in the query. Nucmer will
jpayne@69	218 align through group insertions of up to ~60 bases.
jpayne@69	219 Each base of these group insertions will be reported
jpayne@69	220 in this count. Large insertions will be reported in
jpayne@69	221 the "Insertions" count about.
jpayne@69	222
jpayne@69	223 TotalGIndels - Single Nucleotide Insertions/Deleltions bounded by 20
jpayne@69	224 exact, base-pair matches on both sides.

Mercurial > repos > rliterman > csp2

annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/opt/mummer-3.23/docs/dnadiff.README @ 69:33d812a61356