annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/xmlwf.1 @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 16:23:26 -0400
parents
children
rev   line source
jpayne@68 1 '\" -*- coding: us-ascii -*-
jpayne@68 2 .if \n(.g .ds T< \\FC
jpayne@68 3 .if \n(.g .ds T> \\F[\n[.fam]]
jpayne@68 4 .de URL
jpayne@68 5 \\$2 \(la\\$1\(ra\\$3
jpayne@68 6 ..
jpayne@68 7 .if \n(.g .mso www.tmac
jpayne@68 8 .TH XMLWF 1 "November 6, 2024" "" ""
jpayne@68 9 .SH NAME
jpayne@68 10 xmlwf \- Determines if an XML document is well-formed
jpayne@68 11 .SH SYNOPSIS
jpayne@68 12 'nh
jpayne@68 13 .fi
jpayne@68 14 .ad l
jpayne@68 15 \fBxmlwf\fR \kx
jpayne@68 16 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
jpayne@68 17 'in \n(.iu+\nxu
jpayne@68 18 [\fIOPTIONS\fR] [\fIFILE\fR ...]
jpayne@68 19 'in \n(.iu-\nxu
jpayne@68 20 .ad b
jpayne@68 21 'hy
jpayne@68 22 'nh
jpayne@68 23 .fi
jpayne@68 24 .ad l
jpayne@68 25 \fBxmlwf\fR \kx
jpayne@68 26 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
jpayne@68 27 'in \n(.iu+\nxu
jpayne@68 28 \fB-h\fR | \fB--help\fR
jpayne@68 29 'in \n(.iu-\nxu
jpayne@68 30 .ad b
jpayne@68 31 'hy
jpayne@68 32 'nh
jpayne@68 33 .fi
jpayne@68 34 .ad l
jpayne@68 35 \fBxmlwf\fR \kx
jpayne@68 36 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
jpayne@68 37 'in \n(.iu+\nxu
jpayne@68 38 \fB-v\fR | \fB--version\fR
jpayne@68 39 'in \n(.iu-\nxu
jpayne@68 40 .ad b
jpayne@68 41 'hy
jpayne@68 42 .SH DESCRIPTION
jpayne@68 43 \fBxmlwf\fR uses the Expat library to
jpayne@68 44 determine if an XML document is well-formed. It is
jpayne@68 45 non-validating.
jpayne@68 46 .PP
jpayne@68 47 If you do not specify any files on the command-line, and you
jpayne@68 48 have a recent version of \fBxmlwf\fR, the
jpayne@68 49 input file will be read from standard input.
jpayne@68 50 .SH "WELL-FORMED DOCUMENTS"
jpayne@68 51 A well-formed document must adhere to the
jpayne@68 52 following rules:
jpayne@68 53 .TP 0.2i
jpayne@68 54 \(bu
jpayne@68 55 The file begins with an XML declaration. For instance,
jpayne@68 56 \*(T<<?xml version="1.0" standalone="yes"?>\*(T>.
jpayne@68 57 \fINOTE\fR:
jpayne@68 58 \fBxmlwf\fR does not currently
jpayne@68 59 check for a valid XML declaration.
jpayne@68 60 .TP 0.2i
jpayne@68 61 \(bu
jpayne@68 62 Every start tag is either empty (<tag/>)
jpayne@68 63 or has a corresponding end tag.
jpayne@68 64 .TP 0.2i
jpayne@68 65 \(bu
jpayne@68 66 There is exactly one root element. This element must contain
jpayne@68 67 all other elements in the document. Only comments, white
jpayne@68 68 space, and processing instructions may come after the close
jpayne@68 69 of the root element.
jpayne@68 70 .TP 0.2i
jpayne@68 71 \(bu
jpayne@68 72 All elements nest properly.
jpayne@68 73 .TP 0.2i
jpayne@68 74 \(bu
jpayne@68 75 All attribute values are enclosed in quotes (either single
jpayne@68 76 or double).
jpayne@68 77 .PP
jpayne@68 78 If the document has a DTD, and it strictly complies with that
jpayne@68 79 DTD, then the document is also considered \fIvalid\fR.
jpayne@68 80 \fBxmlwf\fR is a non-validating parser --
jpayne@68 81 it does not check the DTD. However, it does support
jpayne@68 82 external entities (see the \*(T<\fB\-x\fR\*(T> option).
jpayne@68 83 .SH OPTIONS
jpayne@68 84 When an option includes an argument, you may specify the argument either
jpayne@68 85 separately ("\*(T<\fB\-d\fR\*(T> \fIoutput\fR") or concatenated with the
jpayne@68 86 option ("\*(T<\fB\-d\fR\*(T>\fIoutput\fR"). \fBxmlwf\fR
jpayne@68 87 supports both.
jpayne@68 88 .TP
jpayne@68 89 \*(T<\fB\-a\fR\*(T> \fIfactor\fR
jpayne@68 90 Sets the maximum tolerated amplification factor
jpayne@68 91 for protection against billion laughs attacks (default: 100.0).
jpayne@68 92 The amplification factor is calculated as ..
jpayne@68 93
jpayne@68 94 .nf
jpayne@68 95
jpayne@68 96 amplification := (direct + indirect) / direct
jpayne@68 97
jpayne@68 98 .fi
jpayne@68 99
jpayne@68 100 \&.. while parsing, whereas
jpayne@68 101 <direct> is the number of bytes read
jpayne@68 102 from the primary document in parsing and
jpayne@68 103 <indirect> is the number of bytes
jpayne@68 104 added by expanding entities and reading of external DTD files,
jpayne@68 105 combined.
jpayne@68 106
jpayne@68 107 \fINOTE\fR:
jpayne@68 108 If you ever need to increase this value for non-attack payload,
jpayne@68 109 please file a bug report.
jpayne@68 110 .TP
jpayne@68 111 \*(T<\fB\-b\fR\*(T> \fIbytes\fR
jpayne@68 112 Sets the number of output bytes (including amplification)
jpayne@68 113 needed to activate protection against billion laughs attacks
jpayne@68 114 (default: 8 MiB).
jpayne@68 115 This can be thought of as an "activation threshold".
jpayne@68 116
jpayne@68 117 \fINOTE\fR:
jpayne@68 118 If you ever need to increase this value for non-attack payload,
jpayne@68 119 please file a bug report.
jpayne@68 120 .TP
jpayne@68 121 \*(T<\fB\-c\fR\*(T>
jpayne@68 122 If the input file is well-formed and \fBxmlwf\fR
jpayne@68 123 doesn't encounter any errors, the input file is simply copied to
jpayne@68 124 the output directory unchanged.
jpayne@68 125 This implies no namespaces (turns off \*(T<\fB\-n\fR\*(T>) and
jpayne@68 126 requires \*(T<\fB\-d\fR\*(T> to specify an output directory.
jpayne@68 127 .TP
jpayne@68 128 \*(T<\fB\-d\fR\*(T> \fIoutput-dir\fR
jpayne@68 129 Specifies a directory to contain transformed
jpayne@68 130 representations of the input files.
jpayne@68 131 By default, \*(T<\fB\-d\fR\*(T> outputs a canonical representation
jpayne@68 132 (described below).
jpayne@68 133 You can select different output formats using \*(T<\fB\-c\fR\*(T>,
jpayne@68 134 \*(T<\fB\-m\fR\*(T> and \*(T<\fB\-N\fR\*(T>.
jpayne@68 135
jpayne@68 136 The output filenames will
jpayne@68 137 be exactly the same as the input filenames or "STDIN" if the input is
jpayne@68 138 coming from standard input. Therefore, you must be careful that the
jpayne@68 139 output file does not go into the same directory as the input
jpayne@68 140 file. Otherwise, \fBxmlwf\fR will delete the
jpayne@68 141 input file before it generates the output file (just like running
jpayne@68 142 \*(T<cat < file > file\*(T> in most shells).
jpayne@68 143
jpayne@68 144 Two structurally equivalent XML documents have a byte-for-byte
jpayne@68 145 identical canonical XML representation.
jpayne@68 146 Note that ignorable white space is considered significant and
jpayne@68 147 is treated equivalently to data.
jpayne@68 148 More on canonical XML can be found at
jpayne@68 149 http://www.jclark.com/xml/canonxml.html .
jpayne@68 150 .TP
jpayne@68 151 \*(T<\fB\-e\fR\*(T> \fIencoding\fR
jpayne@68 152 Specifies the character encoding for the document, overriding
jpayne@68 153 any document encoding declaration. \fBxmlwf\fR
jpayne@68 154 supports four built-in encodings:
jpayne@68 155 \*(T<US\-ASCII\*(T>,
jpayne@68 156 \*(T<UTF\-8\*(T>,
jpayne@68 157 \*(T<UTF\-16\*(T>, and
jpayne@68 158 \*(T<ISO\-8859\-1\*(T>.
jpayne@68 159 Also see the \*(T<\fB\-w\fR\*(T> option.
jpayne@68 160 .TP
jpayne@68 161 \*(T<\fB\-g\fR\*(T> \fIbytes\fR
jpayne@68 162 Sets the buffer size to request per call pair to
jpayne@68 163 \*(T<\fBXML_GetBuffer\fR\*(T> and \*(T<\fBread\fR\*(T>
jpayne@68 164 (default: 8 KiB).
jpayne@68 165 .TP
jpayne@68 166 \*(T<\fB\-h\fR\*(T>, \*(T<\fB\-\-help\fR\*(T>
jpayne@68 167 Prints short usage information on command \fBxmlwf\fR,
jpayne@68 168 and then exits.
jpayne@68 169 Similar to this man page but more concise.
jpayne@68 170 .TP
jpayne@68 171 \*(T<\fB\-k\fR\*(T>
jpayne@68 172 When processing multiple files, \fBxmlwf\fR
jpayne@68 173 by default halts after the the first file with an error.
jpayne@68 174 This tells \fBxmlwf\fR to report the error
jpayne@68 175 but to keep processing.
jpayne@68 176 This can be useful, for example, when testing a filter that converts
jpayne@68 177 many files to XML and you want to quickly find out which conversions
jpayne@68 178 failed.
jpayne@68 179 .TP
jpayne@68 180 \*(T<\fB\-m\fR\*(T>
jpayne@68 181 Outputs some strange sort of XML file that completely
jpayne@68 182 describes the input file, including character positions.
jpayne@68 183 Requires \*(T<\fB\-d\fR\*(T> to specify an output file.
jpayne@68 184 .TP
jpayne@68 185 \*(T<\fB\-n\fR\*(T>
jpayne@68 186 Turns on namespace processing. (describe namespaces)
jpayne@68 187 \*(T<\fB\-c\fR\*(T> disables namespaces.
jpayne@68 188 .TP
jpayne@68 189 \*(T<\fB\-N\fR\*(T>
jpayne@68 190 Adds a doctype and notation declarations to canonical XML output.
jpayne@68 191 This matches the example output used by the formal XML test cases.
jpayne@68 192 Requires \*(T<\fB\-d\fR\*(T> to specify an output file.
jpayne@68 193 .TP
jpayne@68 194 \*(T<\fB\-p\fR\*(T>
jpayne@68 195 Tells \fBxmlwf\fR to process external DTDs and parameter
jpayne@68 196 entities.
jpayne@68 197
jpayne@68 198 Normally \fBxmlwf\fR never parses parameter
jpayne@68 199 entities. \*(T<\fB\-p\fR\*(T> tells it to always parse them.
jpayne@68 200 \*(T<\fB\-p\fR\*(T> implies \*(T<\fB\-x\fR\*(T>.
jpayne@68 201 .TP
jpayne@68 202 \*(T<\fB\-q\fR\*(T>
jpayne@68 203 Disable reparse deferral, and allow quadratic parse runtime
jpayne@68 204 on large tokens (default: reparse deferral enabled).
jpayne@68 205 .TP
jpayne@68 206 \*(T<\fB\-r\fR\*(T>
jpayne@68 207 Normally \fBxmlwf\fR memory-maps the XML file
jpayne@68 208 before parsing; this can result in faster parsing on many
jpayne@68 209 platforms.
jpayne@68 210 \*(T<\fB\-r\fR\*(T> turns off memory-mapping and uses normal file
jpayne@68 211 IO calls instead.
jpayne@68 212 Of course, memory-mapping is automatically turned off
jpayne@68 213 when reading from standard input.
jpayne@68 214
jpayne@68 215 Use of memory-mapping can cause some platforms to report
jpayne@68 216 substantially higher memory usage for
jpayne@68 217 \fBxmlwf\fR, but this appears to be a matter of
jpayne@68 218 the operating system reporting memory in a strange way; there is
jpayne@68 219 not a leak in \fBxmlwf\fR.
jpayne@68 220 .TP
jpayne@68 221 \*(T<\fB\-s\fR\*(T>
jpayne@68 222 Prints an error if the document is not standalone.
jpayne@68 223 A document is standalone if it has no external subset and no
jpayne@68 224 references to parameter entities.
jpayne@68 225 .TP
jpayne@68 226 \*(T<\fB\-t\fR\*(T>
jpayne@68 227 Turns on timings. This tells Expat to parse the entire file,
jpayne@68 228 but not perform any processing.
jpayne@68 229 This gives a fairly accurate idea of the raw speed of Expat itself
jpayne@68 230 without client overhead.
jpayne@68 231 \*(T<\fB\-t\fR\*(T> turns off most of the output options
jpayne@68 232 (\*(T<\fB\-d\fR\*(T>, \*(T<\fB\-m\fR\*(T>, \*(T<\fB\-c\fR\*(T>, ...).
jpayne@68 233 .TP
jpayne@68 234 \*(T<\fB\-v\fR\*(T>, \*(T<\fB\-\-version\fR\*(T>
jpayne@68 235 Prints the version of the Expat library being used, including some
jpayne@68 236 information on the compile-time configuration of the library, and
jpayne@68 237 then exits.
jpayne@68 238 .TP
jpayne@68 239 \*(T<\fB\-w\fR\*(T>
jpayne@68 240 Enables support for Windows code pages.
jpayne@68 241 Normally, \fBxmlwf\fR will throw an error if it
jpayne@68 242 runs across an encoding that it is not equipped to handle itself. With
jpayne@68 243 \*(T<\fB\-w\fR\*(T>, \fBxmlwf\fR will try to use a Windows code
jpayne@68 244 page. See also \*(T<\fB\-e\fR\*(T>.
jpayne@68 245 .TP
jpayne@68 246 \*(T<\fB\-x\fR\*(T>
jpayne@68 247 Turns on parsing external entities.
jpayne@68 248
jpayne@68 249 Non-validating parsers are not required to resolve external
jpayne@68 250 entities, or even expand entities at all.
jpayne@68 251 Expat always expands internal entities (?),
jpayne@68 252 but external entity parsing must be enabled explicitly.
jpayne@68 253
jpayne@68 254 External entities are simply entities that obtain their
jpayne@68 255 data from outside the XML file currently being parsed.
jpayne@68 256
jpayne@68 257 This is an example of an internal entity:
jpayne@68 258
jpayne@68 259 .nf
jpayne@68 260
jpayne@68 261 <!ENTITY vers '1.0.2'>
jpayne@68 262 .fi
jpayne@68 263
jpayne@68 264 And here are some examples of external entities:
jpayne@68 265
jpayne@68 266 .nf
jpayne@68 267
jpayne@68 268 <!ENTITY header SYSTEM "header\-&vers;.xml"> (parsed)
jpayne@68 269 <!ENTITY logo SYSTEM "logo.png" PNG> (unparsed)
jpayne@68 270 .fi
jpayne@68 271 .TP
jpayne@68 272 \*(T<\fB\-\-\fR\*(T>
jpayne@68 273 (Two hyphens.)
jpayne@68 274 Terminates the list of options. This is only needed if a filename
jpayne@68 275 starts with a hyphen. For example:
jpayne@68 276
jpayne@68 277 .nf
jpayne@68 278
jpayne@68 279 xmlwf \-\- \-myfile.xml
jpayne@68 280 .fi
jpayne@68 281
jpayne@68 282 will run \fBxmlwf\fR on the file
jpayne@68 283 \*(T<\fI\-myfile.xml\fR\*(T>.
jpayne@68 284 .PP
jpayne@68 285 Older versions of \fBxmlwf\fR do not support
jpayne@68 286 reading from standard input.
jpayne@68 287 .SH OUTPUT
jpayne@68 288 \fBxmlwf\fR outputs nothing for files which are problem-free.
jpayne@68 289 If any input file is not well-formed, or if the output for any
jpayne@68 290 input file cannot be opened, \fBxmlwf\fR prints a single
jpayne@68 291 line describing the problem to standard output.
jpayne@68 292 .PP
jpayne@68 293 If the \*(T<\fB\-k\fR\*(T> option is not provided, \fBxmlwf\fR
jpayne@68 294 halts upon encountering a well-formedness or output-file error.
jpayne@68 295 If \*(T<\fB\-k\fR\*(T> is provided, \fBxmlwf\fR continues
jpayne@68 296 processing the remaining input files, describing problems found with any of them.
jpayne@68 297 .SH "EXIT STATUS"
jpayne@68 298 For options \*(T<\fB\-v\fR\*(T>|\*(T<\fB\-\-version\fR\*(T> or \*(T<\fB\-h\fR\*(T>|\*(T<\fB\-\-help\fR\*(T>, \fBxmlwf\fR always exits with status code 0. For other cases, the following exit status codes are returned:
jpayne@68 299 .TP
jpayne@68 300 \*(T<\fB0\fR\*(T>
jpayne@68 301 The input files are well-formed and the output (if requested) was written successfully.
jpayne@68 302 .TP
jpayne@68 303 \*(T<\fB1\fR\*(T>
jpayne@68 304 An internal error occurred.
jpayne@68 305 .TP
jpayne@68 306 \*(T<\fB2\fR\*(T>
jpayne@68 307 One or more input files were not well-formed or could not be parsed.
jpayne@68 308 .TP
jpayne@68 309 \*(T<\fB3\fR\*(T>
jpayne@68 310 If using the \*(T<\fB\-d\fR\*(T> option, an error occurred opening an output file.
jpayne@68 311 .TP
jpayne@68 312 \*(T<\fB4\fR\*(T>
jpayne@68 313 There was a command-line argument error in how \fBxmlwf\fR was invoked.
jpayne@68 314 .SH BUGS
jpayne@68 315 The errors should go to standard error, not standard output.
jpayne@68 316 .PP
jpayne@68 317 There should be a way to get \*(T<\fB\-d\fR\*(T> to send its
jpayne@68 318 output to standard output rather than forcing the user to send
jpayne@68 319 it to a file.
jpayne@68 320 .PP
jpayne@68 321 I have no idea why anyone would want to use the
jpayne@68 322 \*(T<\fB\-d\fR\*(T>, \*(T<\fB\-c\fR\*(T>, and
jpayne@68 323 \*(T<\fB\-m\fR\*(T> options. If someone could explain it to
jpayne@68 324 me, I'd like to add this information to this manpage.
jpayne@68 325 .SH "SEE ALSO"
jpayne@68 326 .nf
jpayne@68 327
jpayne@68 328 The Expat home page: https://libexpat.github.io/
jpayne@68 329 The W3 XML 1.0 specification (fourth edition): https://www.w3.org/TR/2006/REC\-xml\-20060816/
jpayne@68 330 Billion laughs attack: https://en.wikipedia.org/wiki/Billion_laughs_attack
jpayne@68 331 .fi
jpayne@68 332 .SH AUTHOR
jpayne@68 333 This manual page was originally written by Scott Bronson <\*(T<bronson@rinspin.com\*(T>>
jpayne@68 334 in December 2001 for
jpayne@68 335 the Debian GNU/Linux system (but may be used by others). Permission is
jpayne@68 336 granted to copy, distribute and/or modify this document under
jpayne@68 337 the terms of the GNU Free Documentation
jpayne@68 338 License, Version 1.1.