comparison CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/email/architecture.rst @ 69:33d812a61356

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 17:55:14 -0400
parents
children
comparison
equal deleted inserted replaced
67:0e9998148a16 69:33d812a61356
1 :mod:`email` Package Architecture
2 =================================
3
4 Overview
5 --------
6
7 The email package consists of three major components:
8
9 Model
10 An object structure that represents an email message, and provides an
11 API for creating, querying, and modifying a message.
12
13 Parser
14 Takes a sequence of characters or bytes and produces a model of the
15 email message represented by those characters or bytes.
16
17 Generator
18 Takes a model and turns it into a sequence of characters or bytes. The
19 sequence can either be intended for human consumption (a printable
20 unicode string) or bytes suitable for transmission over the wire. In
21 the latter case all data is properly encoded using the content transfer
22 encodings specified by the relevant RFCs.
23
24 Conceptually the package is organized around the model. The model provides both
25 "external" APIs intended for use by application programs using the library,
26 and "internal" APIs intended for use by the Parser and Generator components.
27 This division is intentionally a bit fuzzy; the API described by this
28 documentation is all a public, stable API. This allows for an application
29 with special needs to implement its own parser and/or generator.
30
31 In addition to the three major functional components, there is a third key
32 component to the architecture:
33
34 Policy
35 An object that specifies various behavioral settings and carries
36 implementations of various behavior-controlling methods.
37
38 The Policy framework provides a simple and convenient way to control the
39 behavior of the library, making it possible for the library to be used in a
40 very flexible fashion while leveraging the common code required to parse,
41 represent, and generate message-like objects. For example, in addition to the
42 default :rfc:`5322` email message policy, we also have a policy that manages
43 HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy
44 controls, such as the maximum line length produced by the generator, can also
45 be controlled individually to meet specialized application requirements.
46
47
48 The Model
49 ---------
50
51 The message model is implemented by the :class:`~email.message.Message` class.
52 The model divides a message into the two fundamental parts discussed by the
53 RFC: the header section and the body. The `Message` object acts as a
54 pseudo-dictionary of named headers. Its dictionary interface provides
55 convenient access to individual headers by name. However, all headers are kept
56 internally in an ordered list, so that the information about the order of the
57 headers in the original message is preserved.
58
59 The `Message` object also has a `payload` that holds the body. A `payload` can
60 be one of two things: data, or a list of `Message` objects. The latter is used
61 to represent a multipart MIME message. Lists can be nested arbitrarily deeply
62 in order to represent the message, with all terminal leaves having non-list
63 data payloads.
64
65
66 Message Lifecycle
67 -----------------
68
69 The general lifecycle of a message is:
70
71 Creation
72 A `Message` object can be created by a Parser, or it can be
73 instantiated as an empty message by an application.
74
75 Manipulation
76 The application may examine one or more headers, and/or the
77 payload, and it may modify one or more headers and/or
78 the payload. This may be done on the top level `Message`
79 object, or on any sub-object.
80
81 Finalization
82 The Model is converted into a unicode or binary stream,
83 or the model is discarded.
84
85
86
87 Header Policy Control During Lifecycle
88 --------------------------------------
89
90 One of the major controls exerted by the Policy is the management of headers
91 during the `Message` lifecycle. Most applications don't need to be aware of
92 this.
93
94 A header enters the model in one of two ways: via a Parser, or by being set to
95 a specific value by an application program after the Model already exists.
96 Similarly, a header exits the model in one of two ways: by being serialized by
97 a Generator, or by being retrieved from a Model by an application program. The
98 Policy object provides hooks for all four of these pathways.
99
100 The model storage for headers is a list of (name, value) tuples.
101
102 The Parser identifies headers during parsing, and passes them to the
103 :meth:`~email.policy.Policy.header_source_parse` method of the Policy. The
104 result of that method is the (name, value) tuple to be stored in the model.
105
106 When an application program supplies a header value (for example, through the
107 `Message` object `__setitem__` interface), the name and the value are passed to
108 the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which
109 returns the (name, value) tuple to be stored in the model.
110
111 When an application program retrieves a header (through any of the dict or list
112 interfaces of `Message`), the name and value are passed to the
113 :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to
114 obtain the value returned to the application.
115
116 When a Generator requests a header during serialization, the name and value are
117 passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which
118 returns a string containing line breaks in the appropriate places. The
119 :meth:`~email.policy.Policy.cte_type` Policy control determines whether or
120 not Content Transfer Encoding is performed on the data in the header. There is
121 also a :meth:`~email.policy.Policy.binary_fold` method for use by generators
122 that produce binary output, which returns the folded header as binary data,
123 possibly folded at different places than the corresponding string would be.
124
125
126 Handling Binary Data
127 --------------------
128
129 In an ideal world all message data would conform to the RFCs, meaning that the
130 parser could decode the message into the idealized unicode message that the
131 sender originally wrote. In the real world, the email package must also be
132 able to deal with badly formatted messages, including messages containing
133 non-ASCII characters that either have no indicated character set or are not
134 valid characters in the indicated character set.
135
136 Since email messages are *primarily* text data, and operations on message data
137 are primarily text operations (except for binary payloads of course), the model
138 stores all text data as unicode strings. Un-decodable binary inside text
139 data is handled by using the `surrogateescape` error handler of the ASCII
140 codec. As with the binary filenames the error handler was introduced to
141 handle, this allows the email package to "carry" the binary data received
142 during parsing along until the output stage, at which time it is regenerated
143 in its original form.
144
145 This carried binary data is almost entirely an implementation detail. The one
146 place where it is visible in the API is in the "internal" API. A Parser must
147 do the `surrogateescape` encoding of binary input data, and pass that data to
148 the appropriate Policy method. The "internal" interface used by the Generator
149 to access header values preserves the `surrogateescaped` bytes. All other
150 interfaces convert the binary data either back into bytes or into a safe form
151 (losing information in some cases).
152
153
154 Backward Compatibility
155 ----------------------
156
157 The :class:`~email.policy.Policy.Compat32` Policy provides backward
158 compatibility with version 5.1 of the email package. It does this via the
159 following implementation of the four+1 Policy methods described above:
160
161 header_source_parse
162 Splits the first line on the colon to obtain the name, discards any spaces
163 after the colon, and joins the remainder of the line with all of the
164 remaining lines, preserving the linesep characters to obtain the value.
165 Trailing carriage return and/or linefeed characters are stripped from the
166 resulting value string.
167
168 header_store_parse
169 Returns the name and value exactly as received from the application.
170
171 header_fetch_parse
172 If the value contains any `surrogateescaped` binary data, return the value
173 as a :class:`~email.header.Header` object, using the character set
174 `unknown-8bit`. Otherwise just returns the value.
175
176 fold
177 Uses :class:`~email.header.Header`'s folding to fold headers in the
178 same way the email5.1 generator did.
179
180 binary_fold
181 Same as fold, but encodes to 'ascii'.
182
183
184 New Algorithm
185 -------------
186
187 header_source_parse
188 Same as legacy behavior.
189
190 header_store_parse
191 Same as legacy behavior.
192
193 header_fetch_parse
194 If the value is already a header object, returns it. Otherwise, parses the
195 value using the new parser, and returns the resulting object as the value.
196 `surrogateescaped` bytes get turned into unicode unknown character code
197 points.
198
199 fold
200 Uses the new header folding algorithm, respecting the policy settings.
201 surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for
202 ``cte_type=7bit`` or ``8bit``. Returns a string.
203
204 At some point there will also be a ``cte_type=unicode``, and for that
205 policy fold will serialize the idealized unicode message with RFC-like
206 folding, converting any surrogateescaped bytes into the unicode
207 unknown character glyph.
208
209 binary_fold
210 Uses the new header folding algorithm, respecting the policy settings.
211 surrogateescaped bytes are encoded using the `unknown-8bit` charset for
212 ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``.
213 Returns bytes.
214
215 At some point there will also be a ``cte_type=unicode``, and for that
216 policy binary_fold will serialize the message according to :rfc:``5335``.