comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/xmlwf.1 @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 16:23:26 -0400
parents
children
comparison
equal deleted inserted replaced
67:0e9998148a16 68:5028fdace37b
1 '\" -*- coding: us-ascii -*-
2 .if \n(.g .ds T< \\FC
3 .if \n(.g .ds T> \\F[\n[.fam]]
4 .de URL
5 \\$2 \(la\\$1\(ra\\$3
6 ..
7 .if \n(.g .mso www.tmac
8 .TH XMLWF 1 "November 6, 2024" "" ""
9 .SH NAME
10 xmlwf \- Determines if an XML document is well-formed
11 .SH SYNOPSIS
12 'nh
13 .fi
14 .ad l
15 \fBxmlwf\fR \kx
16 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
17 'in \n(.iu+\nxu
18 [\fIOPTIONS\fR] [\fIFILE\fR ...]
19 'in \n(.iu-\nxu
20 .ad b
21 'hy
22 'nh
23 .fi
24 .ad l
25 \fBxmlwf\fR \kx
26 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
27 'in \n(.iu+\nxu
28 \fB-h\fR | \fB--help\fR
29 'in \n(.iu-\nxu
30 .ad b
31 'hy
32 'nh
33 .fi
34 .ad l
35 \fBxmlwf\fR \kx
36 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5)
37 'in \n(.iu+\nxu
38 \fB-v\fR | \fB--version\fR
39 'in \n(.iu-\nxu
40 .ad b
41 'hy
42 .SH DESCRIPTION
43 \fBxmlwf\fR uses the Expat library to
44 determine if an XML document is well-formed. It is
45 non-validating.
46 .PP
47 If you do not specify any files on the command-line, and you
48 have a recent version of \fBxmlwf\fR, the
49 input file will be read from standard input.
50 .SH "WELL-FORMED DOCUMENTS"
51 A well-formed document must adhere to the
52 following rules:
53 .TP 0.2i
54 \(bu
55 The file begins with an XML declaration. For instance,
56 \*(T<<?xml version="1.0" standalone="yes"?>\*(T>.
57 \fINOTE\fR:
58 \fBxmlwf\fR does not currently
59 check for a valid XML declaration.
60 .TP 0.2i
61 \(bu
62 Every start tag is either empty (<tag/>)
63 or has a corresponding end tag.
64 .TP 0.2i
65 \(bu
66 There is exactly one root element. This element must contain
67 all other elements in the document. Only comments, white
68 space, and processing instructions may come after the close
69 of the root element.
70 .TP 0.2i
71 \(bu
72 All elements nest properly.
73 .TP 0.2i
74 \(bu
75 All attribute values are enclosed in quotes (either single
76 or double).
77 .PP
78 If the document has a DTD, and it strictly complies with that
79 DTD, then the document is also considered \fIvalid\fR.
80 \fBxmlwf\fR is a non-validating parser --
81 it does not check the DTD. However, it does support
82 external entities (see the \*(T<\fB\-x\fR\*(T> option).
83 .SH OPTIONS
84 When an option includes an argument, you may specify the argument either
85 separately ("\*(T<\fB\-d\fR\*(T> \fIoutput\fR") or concatenated with the
86 option ("\*(T<\fB\-d\fR\*(T>\fIoutput\fR"). \fBxmlwf\fR
87 supports both.
88 .TP
89 \*(T<\fB\-a\fR\*(T> \fIfactor\fR
90 Sets the maximum tolerated amplification factor
91 for protection against billion laughs attacks (default: 100.0).
92 The amplification factor is calculated as ..
93
94 .nf
95
96 amplification := (direct + indirect) / direct
97
98 .fi
99
100 \&.. while parsing, whereas
101 <direct> is the number of bytes read
102 from the primary document in parsing and
103 <indirect> is the number of bytes
104 added by expanding entities and reading of external DTD files,
105 combined.
106
107 \fINOTE\fR:
108 If you ever need to increase this value for non-attack payload,
109 please file a bug report.
110 .TP
111 \*(T<\fB\-b\fR\*(T> \fIbytes\fR
112 Sets the number of output bytes (including amplification)
113 needed to activate protection against billion laughs attacks
114 (default: 8 MiB).
115 This can be thought of as an "activation threshold".
116
117 \fINOTE\fR:
118 If you ever need to increase this value for non-attack payload,
119 please file a bug report.
120 .TP
121 \*(T<\fB\-c\fR\*(T>
122 If the input file is well-formed and \fBxmlwf\fR
123 doesn't encounter any errors, the input file is simply copied to
124 the output directory unchanged.
125 This implies no namespaces (turns off \*(T<\fB\-n\fR\*(T>) and
126 requires \*(T<\fB\-d\fR\*(T> to specify an output directory.
127 .TP
128 \*(T<\fB\-d\fR\*(T> \fIoutput-dir\fR
129 Specifies a directory to contain transformed
130 representations of the input files.
131 By default, \*(T<\fB\-d\fR\*(T> outputs a canonical representation
132 (described below).
133 You can select different output formats using \*(T<\fB\-c\fR\*(T>,
134 \*(T<\fB\-m\fR\*(T> and \*(T<\fB\-N\fR\*(T>.
135
136 The output filenames will
137 be exactly the same as the input filenames or "STDIN" if the input is
138 coming from standard input. Therefore, you must be careful that the
139 output file does not go into the same directory as the input
140 file. Otherwise, \fBxmlwf\fR will delete the
141 input file before it generates the output file (just like running
142 \*(T<cat < file > file\*(T> in most shells).
143
144 Two structurally equivalent XML documents have a byte-for-byte
145 identical canonical XML representation.
146 Note that ignorable white space is considered significant and
147 is treated equivalently to data.
148 More on canonical XML can be found at
149 http://www.jclark.com/xml/canonxml.html .
150 .TP
151 \*(T<\fB\-e\fR\*(T> \fIencoding\fR
152 Specifies the character encoding for the document, overriding
153 any document encoding declaration. \fBxmlwf\fR
154 supports four built-in encodings:
155 \*(T<US\-ASCII\*(T>,
156 \*(T<UTF\-8\*(T>,
157 \*(T<UTF\-16\*(T>, and
158 \*(T<ISO\-8859\-1\*(T>.
159 Also see the \*(T<\fB\-w\fR\*(T> option.
160 .TP
161 \*(T<\fB\-g\fR\*(T> \fIbytes\fR
162 Sets the buffer size to request per call pair to
163 \*(T<\fBXML_GetBuffer\fR\*(T> and \*(T<\fBread\fR\*(T>
164 (default: 8 KiB).
165 .TP
166 \*(T<\fB\-h\fR\*(T>, \*(T<\fB\-\-help\fR\*(T>
167 Prints short usage information on command \fBxmlwf\fR,
168 and then exits.
169 Similar to this man page but more concise.
170 .TP
171 \*(T<\fB\-k\fR\*(T>
172 When processing multiple files, \fBxmlwf\fR
173 by default halts after the the first file with an error.
174 This tells \fBxmlwf\fR to report the error
175 but to keep processing.
176 This can be useful, for example, when testing a filter that converts
177 many files to XML and you want to quickly find out which conversions
178 failed.
179 .TP
180 \*(T<\fB\-m\fR\*(T>
181 Outputs some strange sort of XML file that completely
182 describes the input file, including character positions.
183 Requires \*(T<\fB\-d\fR\*(T> to specify an output file.
184 .TP
185 \*(T<\fB\-n\fR\*(T>
186 Turns on namespace processing. (describe namespaces)
187 \*(T<\fB\-c\fR\*(T> disables namespaces.
188 .TP
189 \*(T<\fB\-N\fR\*(T>
190 Adds a doctype and notation declarations to canonical XML output.
191 This matches the example output used by the formal XML test cases.
192 Requires \*(T<\fB\-d\fR\*(T> to specify an output file.
193 .TP
194 \*(T<\fB\-p\fR\*(T>
195 Tells \fBxmlwf\fR to process external DTDs and parameter
196 entities.
197
198 Normally \fBxmlwf\fR never parses parameter
199 entities. \*(T<\fB\-p\fR\*(T> tells it to always parse them.
200 \*(T<\fB\-p\fR\*(T> implies \*(T<\fB\-x\fR\*(T>.
201 .TP
202 \*(T<\fB\-q\fR\*(T>
203 Disable reparse deferral, and allow quadratic parse runtime
204 on large tokens (default: reparse deferral enabled).
205 .TP
206 \*(T<\fB\-r\fR\*(T>
207 Normally \fBxmlwf\fR memory-maps the XML file
208 before parsing; this can result in faster parsing on many
209 platforms.
210 \*(T<\fB\-r\fR\*(T> turns off memory-mapping and uses normal file
211 IO calls instead.
212 Of course, memory-mapping is automatically turned off
213 when reading from standard input.
214
215 Use of memory-mapping can cause some platforms to report
216 substantially higher memory usage for
217 \fBxmlwf\fR, but this appears to be a matter of
218 the operating system reporting memory in a strange way; there is
219 not a leak in \fBxmlwf\fR.
220 .TP
221 \*(T<\fB\-s\fR\*(T>
222 Prints an error if the document is not standalone.
223 A document is standalone if it has no external subset and no
224 references to parameter entities.
225 .TP
226 \*(T<\fB\-t\fR\*(T>
227 Turns on timings. This tells Expat to parse the entire file,
228 but not perform any processing.
229 This gives a fairly accurate idea of the raw speed of Expat itself
230 without client overhead.
231 \*(T<\fB\-t\fR\*(T> turns off most of the output options
232 (\*(T<\fB\-d\fR\*(T>, \*(T<\fB\-m\fR\*(T>, \*(T<\fB\-c\fR\*(T>, ...).
233 .TP
234 \*(T<\fB\-v\fR\*(T>, \*(T<\fB\-\-version\fR\*(T>
235 Prints the version of the Expat library being used, including some
236 information on the compile-time configuration of the library, and
237 then exits.
238 .TP
239 \*(T<\fB\-w\fR\*(T>
240 Enables support for Windows code pages.
241 Normally, \fBxmlwf\fR will throw an error if it
242 runs across an encoding that it is not equipped to handle itself. With
243 \*(T<\fB\-w\fR\*(T>, \fBxmlwf\fR will try to use a Windows code
244 page. See also \*(T<\fB\-e\fR\*(T>.
245 .TP
246 \*(T<\fB\-x\fR\*(T>
247 Turns on parsing external entities.
248
249 Non-validating parsers are not required to resolve external
250 entities, or even expand entities at all.
251 Expat always expands internal entities (?),
252 but external entity parsing must be enabled explicitly.
253
254 External entities are simply entities that obtain their
255 data from outside the XML file currently being parsed.
256
257 This is an example of an internal entity:
258
259 .nf
260
261 <!ENTITY vers '1.0.2'>
262 .fi
263
264 And here are some examples of external entities:
265
266 .nf
267
268 <!ENTITY header SYSTEM "header\-&vers;.xml"> (parsed)
269 <!ENTITY logo SYSTEM "logo.png" PNG> (unparsed)
270 .fi
271 .TP
272 \*(T<\fB\-\-\fR\*(T>
273 (Two hyphens.)
274 Terminates the list of options. This is only needed if a filename
275 starts with a hyphen. For example:
276
277 .nf
278
279 xmlwf \-\- \-myfile.xml
280 .fi
281
282 will run \fBxmlwf\fR on the file
283 \*(T<\fI\-myfile.xml\fR\*(T>.
284 .PP
285 Older versions of \fBxmlwf\fR do not support
286 reading from standard input.
287 .SH OUTPUT
288 \fBxmlwf\fR outputs nothing for files which are problem-free.
289 If any input file is not well-formed, or if the output for any
290 input file cannot be opened, \fBxmlwf\fR prints a single
291 line describing the problem to standard output.
292 .PP
293 If the \*(T<\fB\-k\fR\*(T> option is not provided, \fBxmlwf\fR
294 halts upon encountering a well-formedness or output-file error.
295 If \*(T<\fB\-k\fR\*(T> is provided, \fBxmlwf\fR continues
296 processing the remaining input files, describing problems found with any of them.
297 .SH "EXIT STATUS"
298 For options \*(T<\fB\-v\fR\*(T>|\*(T<\fB\-\-version\fR\*(T> or \*(T<\fB\-h\fR\*(T>|\*(T<\fB\-\-help\fR\*(T>, \fBxmlwf\fR always exits with status code 0. For other cases, the following exit status codes are returned:
299 .TP
300 \*(T<\fB0\fR\*(T>
301 The input files are well-formed and the output (if requested) was written successfully.
302 .TP
303 \*(T<\fB1\fR\*(T>
304 An internal error occurred.
305 .TP
306 \*(T<\fB2\fR\*(T>
307 One or more input files were not well-formed or could not be parsed.
308 .TP
309 \*(T<\fB3\fR\*(T>
310 If using the \*(T<\fB\-d\fR\*(T> option, an error occurred opening an output file.
311 .TP
312 \*(T<\fB4\fR\*(T>
313 There was a command-line argument error in how \fBxmlwf\fR was invoked.
314 .SH BUGS
315 The errors should go to standard error, not standard output.
316 .PP
317 There should be a way to get \*(T<\fB\-d\fR\*(T> to send its
318 output to standard output rather than forcing the user to send
319 it to a file.
320 .PP
321 I have no idea why anyone would want to use the
322 \*(T<\fB\-d\fR\*(T>, \*(T<\fB\-c\fR\*(T>, and
323 \*(T<\fB\-m\fR\*(T> options. If someone could explain it to
324 me, I'd like to add this information to this manpage.
325 .SH "SEE ALSO"
326 .nf
327
328 The Expat home page: https://libexpat.github.io/
329 The W3 XML 1.0 specification (fourth edition): https://www.w3.org/TR/2006/REC\-xml\-20060816/
330 Billion laughs attack: https://en.wikipedia.org/wiki/Billion_laughs_attack
331 .fi
332 .SH AUTHOR
333 This manual page was originally written by Scott Bronson <\*(T<bronson@rinspin.com\*(T>>
334 in December 2001 for
335 the Debian GNU/Linux system (but may be used by others). Permission is
336 granted to copy, distribute and/or modify this document under
337 the terms of the GNU Free Documentation
338 License, Version 1.1.