Mercurial > repos > rliterman > csp2
comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/email/architecture.rst @ 68:5028fdace37b
planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author | jpayne |
---|---|
date | Tue, 18 Mar 2025 16:23:26 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
67:0e9998148a16 | 68:5028fdace37b |
---|---|
1 :mod:`email` Package Architecture | |
2 ================================= | |
3 | |
4 Overview | |
5 -------- | |
6 | |
7 The email package consists of three major components: | |
8 | |
9 Model | |
10 An object structure that represents an email message, and provides an | |
11 API for creating, querying, and modifying a message. | |
12 | |
13 Parser | |
14 Takes a sequence of characters or bytes and produces a model of the | |
15 email message represented by those characters or bytes. | |
16 | |
17 Generator | |
18 Takes a model and turns it into a sequence of characters or bytes. The | |
19 sequence can either be intended for human consumption (a printable | |
20 unicode string) or bytes suitable for transmission over the wire. In | |
21 the latter case all data is properly encoded using the content transfer | |
22 encodings specified by the relevant RFCs. | |
23 | |
24 Conceptually the package is organized around the model. The model provides both | |
25 "external" APIs intended for use by application programs using the library, | |
26 and "internal" APIs intended for use by the Parser and Generator components. | |
27 This division is intentionally a bit fuzzy; the API described by this | |
28 documentation is all a public, stable API. This allows for an application | |
29 with special needs to implement its own parser and/or generator. | |
30 | |
31 In addition to the three major functional components, there is a third key | |
32 component to the architecture: | |
33 | |
34 Policy | |
35 An object that specifies various behavioral settings and carries | |
36 implementations of various behavior-controlling methods. | |
37 | |
38 The Policy framework provides a simple and convenient way to control the | |
39 behavior of the library, making it possible for the library to be used in a | |
40 very flexible fashion while leveraging the common code required to parse, | |
41 represent, and generate message-like objects. For example, in addition to the | |
42 default :rfc:`5322` email message policy, we also have a policy that manages | |
43 HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy | |
44 controls, such as the maximum line length produced by the generator, can also | |
45 be controlled individually to meet specialized application requirements. | |
46 | |
47 | |
48 The Model | |
49 --------- | |
50 | |
51 The message model is implemented by the :class:`~email.message.Message` class. | |
52 The model divides a message into the two fundamental parts discussed by the | |
53 RFC: the header section and the body. The `Message` object acts as a | |
54 pseudo-dictionary of named headers. Its dictionary interface provides | |
55 convenient access to individual headers by name. However, all headers are kept | |
56 internally in an ordered list, so that the information about the order of the | |
57 headers in the original message is preserved. | |
58 | |
59 The `Message` object also has a `payload` that holds the body. A `payload` can | |
60 be one of two things: data, or a list of `Message` objects. The latter is used | |
61 to represent a multipart MIME message. Lists can be nested arbitrarily deeply | |
62 in order to represent the message, with all terminal leaves having non-list | |
63 data payloads. | |
64 | |
65 | |
66 Message Lifecycle | |
67 ----------------- | |
68 | |
69 The general lifecycle of a message is: | |
70 | |
71 Creation | |
72 A `Message` object can be created by a Parser, or it can be | |
73 instantiated as an empty message by an application. | |
74 | |
75 Manipulation | |
76 The application may examine one or more headers, and/or the | |
77 payload, and it may modify one or more headers and/or | |
78 the payload. This may be done on the top level `Message` | |
79 object, or on any sub-object. | |
80 | |
81 Finalization | |
82 The Model is converted into a unicode or binary stream, | |
83 or the model is discarded. | |
84 | |
85 | |
86 | |
87 Header Policy Control During Lifecycle | |
88 -------------------------------------- | |
89 | |
90 One of the major controls exerted by the Policy is the management of headers | |
91 during the `Message` lifecycle. Most applications don't need to be aware of | |
92 this. | |
93 | |
94 A header enters the model in one of two ways: via a Parser, or by being set to | |
95 a specific value by an application program after the Model already exists. | |
96 Similarly, a header exits the model in one of two ways: by being serialized by | |
97 a Generator, or by being retrieved from a Model by an application program. The | |
98 Policy object provides hooks for all four of these pathways. | |
99 | |
100 The model storage for headers is a list of (name, value) tuples. | |
101 | |
102 The Parser identifies headers during parsing, and passes them to the | |
103 :meth:`~email.policy.Policy.header_source_parse` method of the Policy. The | |
104 result of that method is the (name, value) tuple to be stored in the model. | |
105 | |
106 When an application program supplies a header value (for example, through the | |
107 `Message` object `__setitem__` interface), the name and the value are passed to | |
108 the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which | |
109 returns the (name, value) tuple to be stored in the model. | |
110 | |
111 When an application program retrieves a header (through any of the dict or list | |
112 interfaces of `Message`), the name and value are passed to the | |
113 :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to | |
114 obtain the value returned to the application. | |
115 | |
116 When a Generator requests a header during serialization, the name and value are | |
117 passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which | |
118 returns a string containing line breaks in the appropriate places. The | |
119 :meth:`~email.policy.Policy.cte_type` Policy control determines whether or | |
120 not Content Transfer Encoding is performed on the data in the header. There is | |
121 also a :meth:`~email.policy.Policy.binary_fold` method for use by generators | |
122 that produce binary output, which returns the folded header as binary data, | |
123 possibly folded at different places than the corresponding string would be. | |
124 | |
125 | |
126 Handling Binary Data | |
127 -------------------- | |
128 | |
129 In an ideal world all message data would conform to the RFCs, meaning that the | |
130 parser could decode the message into the idealized unicode message that the | |
131 sender originally wrote. In the real world, the email package must also be | |
132 able to deal with badly formatted messages, including messages containing | |
133 non-ASCII characters that either have no indicated character set or are not | |
134 valid characters in the indicated character set. | |
135 | |
136 Since email messages are *primarily* text data, and operations on message data | |
137 are primarily text operations (except for binary payloads of course), the model | |
138 stores all text data as unicode strings. Un-decodable binary inside text | |
139 data is handled by using the `surrogateescape` error handler of the ASCII | |
140 codec. As with the binary filenames the error handler was introduced to | |
141 handle, this allows the email package to "carry" the binary data received | |
142 during parsing along until the output stage, at which time it is regenerated | |
143 in its original form. | |
144 | |
145 This carried binary data is almost entirely an implementation detail. The one | |
146 place where it is visible in the API is in the "internal" API. A Parser must | |
147 do the `surrogateescape` encoding of binary input data, and pass that data to | |
148 the appropriate Policy method. The "internal" interface used by the Generator | |
149 to access header values preserves the `surrogateescaped` bytes. All other | |
150 interfaces convert the binary data either back into bytes or into a safe form | |
151 (losing information in some cases). | |
152 | |
153 | |
154 Backward Compatibility | |
155 ---------------------- | |
156 | |
157 The :class:`~email.policy.Policy.Compat32` Policy provides backward | |
158 compatibility with version 5.1 of the email package. It does this via the | |
159 following implementation of the four+1 Policy methods described above: | |
160 | |
161 header_source_parse | |
162 Splits the first line on the colon to obtain the name, discards any spaces | |
163 after the colon, and joins the remainder of the line with all of the | |
164 remaining lines, preserving the linesep characters to obtain the value. | |
165 Trailing carriage return and/or linefeed characters are stripped from the | |
166 resulting value string. | |
167 | |
168 header_store_parse | |
169 Returns the name and value exactly as received from the application. | |
170 | |
171 header_fetch_parse | |
172 If the value contains any `surrogateescaped` binary data, return the value | |
173 as a :class:`~email.header.Header` object, using the character set | |
174 `unknown-8bit`. Otherwise just returns the value. | |
175 | |
176 fold | |
177 Uses :class:`~email.header.Header`'s folding to fold headers in the | |
178 same way the email5.1 generator did. | |
179 | |
180 binary_fold | |
181 Same as fold, but encodes to 'ascii'. | |
182 | |
183 | |
184 New Algorithm | |
185 ------------- | |
186 | |
187 header_source_parse | |
188 Same as legacy behavior. | |
189 | |
190 header_store_parse | |
191 Same as legacy behavior. | |
192 | |
193 header_fetch_parse | |
194 If the value is already a header object, returns it. Otherwise, parses the | |
195 value using the new parser, and returns the resulting object as the value. | |
196 `surrogateescaped` bytes get turned into unicode unknown character code | |
197 points. | |
198 | |
199 fold | |
200 Uses the new header folding algorithm, respecting the policy settings. | |
201 surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for | |
202 ``cte_type=7bit`` or ``8bit``. Returns a string. | |
203 | |
204 At some point there will also be a ``cte_type=unicode``, and for that | |
205 policy fold will serialize the idealized unicode message with RFC-like | |
206 folding, converting any surrogateescaped bytes into the unicode | |
207 unknown character glyph. | |
208 | |
209 binary_fold | |
210 Uses the new header folding algorithm, respecting the policy settings. | |
211 surrogateescaped bytes are encoded using the `unknown-8bit` charset for | |
212 ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``. | |
213 Returns bytes. | |
214 | |
215 At some point there will also be a ``cte_type=unicode``, and for that | |
216 policy binary_fold will serialize the message according to :rfc:``5335``. |