Mercurial > repos > rliterman > csp2
comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/share/man/man1/xmlwf.1 @ 68:5028fdace37b
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 16:23:26 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
67:0e9998148a16 | 68:5028fdace37b |
---|---|
1 '\" -*- coding: us-ascii -*- | |
2 .if \n(.g .ds T< \\FC | |
3 .if \n(.g .ds T> \\F[\n[.fam]] | |
4 .de URL | |
5 \\$2 \(la\\$1\(ra\\$3 | |
6 .. | |
7 .if \n(.g .mso www.tmac | |
8 .TH XMLWF 1 "November 6, 2024" "" "" | |
9 .SH NAME | |
10 xmlwf \- Determines if an XML document is well-formed | |
11 .SH SYNOPSIS | |
12 'nh | |
13 .fi | |
14 .ad l | |
15 \fBxmlwf\fR \kx | |
16 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5) | |
17 'in \n(.iu+\nxu | |
18 [\fIOPTIONS\fR] [\fIFILE\fR ...] | |
19 'in \n(.iu-\nxu | |
20 .ad b | |
21 'hy | |
22 'nh | |
23 .fi | |
24 .ad l | |
25 \fBxmlwf\fR \kx | |
26 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5) | |
27 'in \n(.iu+\nxu | |
28 \fB-h\fR | \fB--help\fR | |
29 'in \n(.iu-\nxu | |
30 .ad b | |
31 'hy | |
32 'nh | |
33 .fi | |
34 .ad l | |
35 \fBxmlwf\fR \kx | |
36 .if (\nx>(\n(.l/2)) .nr x (\n(.l/5) | |
37 'in \n(.iu+\nxu | |
38 \fB-v\fR | \fB--version\fR | |
39 'in \n(.iu-\nxu | |
40 .ad b | |
41 'hy | |
42 .SH DESCRIPTION | |
43 \fBxmlwf\fR uses the Expat library to | |
44 determine if an XML document is well-formed. It is | |
45 non-validating. | |
46 .PP | |
47 If you do not specify any files on the command-line, and you | |
48 have a recent version of \fBxmlwf\fR, the | |
49 input file will be read from standard input. | |
50 .SH "WELL-FORMED DOCUMENTS" | |
51 A well-formed document must adhere to the | |
52 following rules: | |
53 .TP 0.2i | |
54 \(bu | |
55 The file begins with an XML declaration. For instance, | |
56 \*(T<<?xml version="1.0" standalone="yes"?>\*(T>. | |
57 \fINOTE\fR: | |
58 \fBxmlwf\fR does not currently | |
59 check for a valid XML declaration. | |
60 .TP 0.2i | |
61 \(bu | |
62 Every start tag is either empty (<tag/>) | |
63 or has a corresponding end tag. | |
64 .TP 0.2i | |
65 \(bu | |
66 There is exactly one root element. This element must contain | |
67 all other elements in the document. Only comments, white | |
68 space, and processing instructions may come after the close | |
69 of the root element. | |
70 .TP 0.2i | |
71 \(bu | |
72 All elements nest properly. | |
73 .TP 0.2i | |
74 \(bu | |
75 All attribute values are enclosed in quotes (either single | |
76 or double). | |
77 .PP | |
78 If the document has a DTD, and it strictly complies with that | |
79 DTD, then the document is also considered \fIvalid\fR. | |
80 \fBxmlwf\fR is a non-validating parser -- | |
81 it does not check the DTD. However, it does support | |
82 external entities (see the \*(T<\fB\-x\fR\*(T> option). | |
83 .SH OPTIONS | |
84 When an option includes an argument, you may specify the argument either | |
85 separately ("\*(T<\fB\-d\fR\*(T> \fIoutput\fR") or concatenated with the | |
86 option ("\*(T<\fB\-d\fR\*(T>\fIoutput\fR"). \fBxmlwf\fR | |
87 supports both. | |
88 .TP | |
89 \*(T<\fB\-a\fR\*(T> \fIfactor\fR | |
90 Sets the maximum tolerated amplification factor | |
91 for protection against billion laughs attacks (default: 100.0). | |
92 The amplification factor is calculated as .. | |
93 | |
94 .nf | |
95 | |
96 amplification := (direct + indirect) / direct | |
97 | |
98 .fi | |
99 | |
100 \&.. while parsing, whereas | |
101 <direct> is the number of bytes read | |
102 from the primary document in parsing and | |
103 <indirect> is the number of bytes | |
104 added by expanding entities and reading of external DTD files, | |
105 combined. | |
106 | |
107 \fINOTE\fR: | |
108 If you ever need to increase this value for non-attack payload, | |
109 please file a bug report. | |
110 .TP | |
111 \*(T<\fB\-b\fR\*(T> \fIbytes\fR | |
112 Sets the number of output bytes (including amplification) | |
113 needed to activate protection against billion laughs attacks | |
114 (default: 8 MiB). | |
115 This can be thought of as an "activation threshold". | |
116 | |
117 \fINOTE\fR: | |
118 If you ever need to increase this value for non-attack payload, | |
119 please file a bug report. | |
120 .TP | |
121 \*(T<\fB\-c\fR\*(T> | |
122 If the input file is well-formed and \fBxmlwf\fR | |
123 doesn't encounter any errors, the input file is simply copied to | |
124 the output directory unchanged. | |
125 This implies no namespaces (turns off \*(T<\fB\-n\fR\*(T>) and | |
126 requires \*(T<\fB\-d\fR\*(T> to specify an output directory. | |
127 .TP | |
128 \*(T<\fB\-d\fR\*(T> \fIoutput-dir\fR | |
129 Specifies a directory to contain transformed | |
130 representations of the input files. | |
131 By default, \*(T<\fB\-d\fR\*(T> outputs a canonical representation | |
132 (described below). | |
133 You can select different output formats using \*(T<\fB\-c\fR\*(T>, | |
134 \*(T<\fB\-m\fR\*(T> and \*(T<\fB\-N\fR\*(T>. | |
135 | |
136 The output filenames will | |
137 be exactly the same as the input filenames or "STDIN" if the input is | |
138 coming from standard input. Therefore, you must be careful that the | |
139 output file does not go into the same directory as the input | |
140 file. Otherwise, \fBxmlwf\fR will delete the | |
141 input file before it generates the output file (just like running | |
142 \*(T<cat < file > file\*(T> in most shells). | |
143 | |
144 Two structurally equivalent XML documents have a byte-for-byte | |
145 identical canonical XML representation. | |
146 Note that ignorable white space is considered significant and | |
147 is treated equivalently to data. | |
148 More on canonical XML can be found at | |
149 http://www.jclark.com/xml/canonxml.html . | |
150 .TP | |
151 \*(T<\fB\-e\fR\*(T> \fIencoding\fR | |
152 Specifies the character encoding for the document, overriding | |
153 any document encoding declaration. \fBxmlwf\fR | |
154 supports four built-in encodings: | |
155 \*(T<US\-ASCII\*(T>, | |
156 \*(T<UTF\-8\*(T>, | |
157 \*(T<UTF\-16\*(T>, and | |
158 \*(T<ISO\-8859\-1\*(T>. | |
159 Also see the \*(T<\fB\-w\fR\*(T> option. | |
160 .TP | |
161 \*(T<\fB\-g\fR\*(T> \fIbytes\fR | |
162 Sets the buffer size to request per call pair to | |
163 \*(T<\fBXML_GetBuffer\fR\*(T> and \*(T<\fBread\fR\*(T> | |
164 (default: 8 KiB). | |
165 .TP | |
166 \*(T<\fB\-h\fR\*(T>, \*(T<\fB\-\-help\fR\*(T> | |
167 Prints short usage information on command \fBxmlwf\fR, | |
168 and then exits. | |
169 Similar to this man page but more concise. | |
170 .TP | |
171 \*(T<\fB\-k\fR\*(T> | |
172 When processing multiple files, \fBxmlwf\fR | |
173 by default halts after the the first file with an error. | |
174 This tells \fBxmlwf\fR to report the error | |
175 but to keep processing. | |
176 This can be useful, for example, when testing a filter that converts | |
177 many files to XML and you want to quickly find out which conversions | |
178 failed. | |
179 .TP | |
180 \*(T<\fB\-m\fR\*(T> | |
181 Outputs some strange sort of XML file that completely | |
182 describes the input file, including character positions. | |
183 Requires \*(T<\fB\-d\fR\*(T> to specify an output file. | |
184 .TP | |
185 \*(T<\fB\-n\fR\*(T> | |
186 Turns on namespace processing. (describe namespaces) | |
187 \*(T<\fB\-c\fR\*(T> disables namespaces. | |
188 .TP | |
189 \*(T<\fB\-N\fR\*(T> | |
190 Adds a doctype and notation declarations to canonical XML output. | |
191 This matches the example output used by the formal XML test cases. | |
192 Requires \*(T<\fB\-d\fR\*(T> to specify an output file. | |
193 .TP | |
194 \*(T<\fB\-p\fR\*(T> | |
195 Tells \fBxmlwf\fR to process external DTDs and parameter | |
196 entities. | |
197 | |
198 Normally \fBxmlwf\fR never parses parameter | |
199 entities. \*(T<\fB\-p\fR\*(T> tells it to always parse them. | |
200 \*(T<\fB\-p\fR\*(T> implies \*(T<\fB\-x\fR\*(T>. | |
201 .TP | |
202 \*(T<\fB\-q\fR\*(T> | |
203 Disable reparse deferral, and allow quadratic parse runtime | |
204 on large tokens (default: reparse deferral enabled). | |
205 .TP | |
206 \*(T<\fB\-r\fR\*(T> | |
207 Normally \fBxmlwf\fR memory-maps the XML file | |
208 before parsing; this can result in faster parsing on many | |
209 platforms. | |
210 \*(T<\fB\-r\fR\*(T> turns off memory-mapping and uses normal file | |
211 IO calls instead. | |
212 Of course, memory-mapping is automatically turned off | |
213 when reading from standard input. | |
214 | |
215 Use of memory-mapping can cause some platforms to report | |
216 substantially higher memory usage for | |
217 \fBxmlwf\fR, but this appears to be a matter of | |
218 the operating system reporting memory in a strange way; there is | |
219 not a leak in \fBxmlwf\fR. | |
220 .TP | |
221 \*(T<\fB\-s\fR\*(T> | |
222 Prints an error if the document is not standalone. | |
223 A document is standalone if it has no external subset and no | |
224 references to parameter entities. | |
225 .TP | |
226 \*(T<\fB\-t\fR\*(T> | |
227 Turns on timings. This tells Expat to parse the entire file, | |
228 but not perform any processing. | |
229 This gives a fairly accurate idea of the raw speed of Expat itself | |
230 without client overhead. | |
231 \*(T<\fB\-t\fR\*(T> turns off most of the output options | |
232 (\*(T<\fB\-d\fR\*(T>, \*(T<\fB\-m\fR\*(T>, \*(T<\fB\-c\fR\*(T>, ...). | |
233 .TP | |
234 \*(T<\fB\-v\fR\*(T>, \*(T<\fB\-\-version\fR\*(T> | |
235 Prints the version of the Expat library being used, including some | |
236 information on the compile-time configuration of the library, and | |
237 then exits. | |
238 .TP | |
239 \*(T<\fB\-w\fR\*(T> | |
240 Enables support for Windows code pages. | |
241 Normally, \fBxmlwf\fR will throw an error if it | |
242 runs across an encoding that it is not equipped to handle itself. With | |
243 \*(T<\fB\-w\fR\*(T>, \fBxmlwf\fR will try to use a Windows code | |
244 page. See also \*(T<\fB\-e\fR\*(T>. | |
245 .TP | |
246 \*(T<\fB\-x\fR\*(T> | |
247 Turns on parsing external entities. | |
248 | |
249 Non-validating parsers are not required to resolve external | |
250 entities, or even expand entities at all. | |
251 Expat always expands internal entities (?), | |
252 but external entity parsing must be enabled explicitly. | |
253 | |
254 External entities are simply entities that obtain their | |
255 data from outside the XML file currently being parsed. | |
256 | |
257 This is an example of an internal entity: | |
258 | |
259 .nf | |
260 | |
261 <!ENTITY vers '1.0.2'> | |
262 .fi | |
263 | |
264 And here are some examples of external entities: | |
265 | |
266 .nf | |
267 | |
268 <!ENTITY header SYSTEM "header\-&vers;.xml"> (parsed) | |
269 <!ENTITY logo SYSTEM "logo.png" PNG> (unparsed) | |
270 .fi | |
271 .TP | |
272 \*(T<\fB\-\-\fR\*(T> | |
273 (Two hyphens.) | |
274 Terminates the list of options. This is only needed if a filename | |
275 starts with a hyphen. For example: | |
276 | |
277 .nf | |
278 | |
279 xmlwf \-\- \-myfile.xml | |
280 .fi | |
281 | |
282 will run \fBxmlwf\fR on the file | |
283 \*(T<\fI\-myfile.xml\fR\*(T>. | |
284 .PP | |
285 Older versions of \fBxmlwf\fR do not support | |
286 reading from standard input. | |
287 .SH OUTPUT | |
288 \fBxmlwf\fR outputs nothing for files which are problem-free. | |
289 If any input file is not well-formed, or if the output for any | |
290 input file cannot be opened, \fBxmlwf\fR prints a single | |
291 line describing the problem to standard output. | |
292 .PP | |
293 If the \*(T<\fB\-k\fR\*(T> option is not provided, \fBxmlwf\fR | |
294 halts upon encountering a well-formedness or output-file error. | |
295 If \*(T<\fB\-k\fR\*(T> is provided, \fBxmlwf\fR continues | |
296 processing the remaining input files, describing problems found with any of them. | |
297 .SH "EXIT STATUS" | |
298 For options \*(T<\fB\-v\fR\*(T>|\*(T<\fB\-\-version\fR\*(T> or \*(T<\fB\-h\fR\*(T>|\*(T<\fB\-\-help\fR\*(T>, \fBxmlwf\fR always exits with status code 0. For other cases, the following exit status codes are returned: | |
299 .TP | |
300 \*(T<\fB0\fR\*(T> | |
301 The input files are well-formed and the output (if requested) was written successfully. | |
302 .TP | |
303 \*(T<\fB1\fR\*(T> | |
304 An internal error occurred. | |
305 .TP | |
306 \*(T<\fB2\fR\*(T> | |
307 One or more input files were not well-formed or could not be parsed. | |
308 .TP | |
309 \*(T<\fB3\fR\*(T> | |
310 If using the \*(T<\fB\-d\fR\*(T> option, an error occurred opening an output file. | |
311 .TP | |
312 \*(T<\fB4\fR\*(T> | |
313 There was a command-line argument error in how \fBxmlwf\fR was invoked. | |
314 .SH BUGS | |
315 The errors should go to standard error, not standard output. | |
316 .PP | |
317 There should be a way to get \*(T<\fB\-d\fR\*(T> to send its | |
318 output to standard output rather than forcing the user to send | |
319 it to a file. | |
320 .PP | |
321 I have no idea why anyone would want to use the | |
322 \*(T<\fB\-d\fR\*(T>, \*(T<\fB\-c\fR\*(T>, and | |
323 \*(T<\fB\-m\fR\*(T> options. If someone could explain it to | |
324 me, I'd like to add this information to this manpage. | |
325 .SH "SEE ALSO" | |
326 .nf | |
327 | |
328 The Expat home page: https://libexpat.github.io/ | |
329 The W3 XML 1.0 specification (fourth edition): https://www.w3.org/TR/2006/REC\-xml\-20060816/ | |
330 Billion laughs attack: https://en.wikipedia.org/wiki/Billion_laughs_attack | |
331 .fi | |
332 .SH AUTHOR | |
333 This manual page was originally written by Scott Bronson <\*(T<bronson@rinspin.com\*(T>> | |
334 in December 2001 for | |
335 the Debian GNU/Linux system (but may be used by others). Permission is | |
336 granted to copy, distribute and/or modify this document under | |
337 the terms of the GNU Free Documentation | |
338 License, Version 1.1. |