jpayne@68
|
1 :mod:`email` Package Architecture
|
jpayne@68
|
2 =================================
|
jpayne@68
|
3
|
jpayne@68
|
4 Overview
|
jpayne@68
|
5 --------
|
jpayne@68
|
6
|
jpayne@68
|
7 The email package consists of three major components:
|
jpayne@68
|
8
|
jpayne@68
|
9 Model
|
jpayne@68
|
10 An object structure that represents an email message, and provides an
|
jpayne@68
|
11 API for creating, querying, and modifying a message.
|
jpayne@68
|
12
|
jpayne@68
|
13 Parser
|
jpayne@68
|
14 Takes a sequence of characters or bytes and produces a model of the
|
jpayne@68
|
15 email message represented by those characters or bytes.
|
jpayne@68
|
16
|
jpayne@68
|
17 Generator
|
jpayne@68
|
18 Takes a model and turns it into a sequence of characters or bytes. The
|
jpayne@68
|
19 sequence can either be intended for human consumption (a printable
|
jpayne@68
|
20 unicode string) or bytes suitable for transmission over the wire. In
|
jpayne@68
|
21 the latter case all data is properly encoded using the content transfer
|
jpayne@68
|
22 encodings specified by the relevant RFCs.
|
jpayne@68
|
23
|
jpayne@68
|
24 Conceptually the package is organized around the model. The model provides both
|
jpayne@68
|
25 "external" APIs intended for use by application programs using the library,
|
jpayne@68
|
26 and "internal" APIs intended for use by the Parser and Generator components.
|
jpayne@68
|
27 This division is intentionally a bit fuzzy; the API described by this
|
jpayne@68
|
28 documentation is all a public, stable API. This allows for an application
|
jpayne@68
|
29 with special needs to implement its own parser and/or generator.
|
jpayne@68
|
30
|
jpayne@68
|
31 In addition to the three major functional components, there is a third key
|
jpayne@68
|
32 component to the architecture:
|
jpayne@68
|
33
|
jpayne@68
|
34 Policy
|
jpayne@68
|
35 An object that specifies various behavioral settings and carries
|
jpayne@68
|
36 implementations of various behavior-controlling methods.
|
jpayne@68
|
37
|
jpayne@68
|
38 The Policy framework provides a simple and convenient way to control the
|
jpayne@68
|
39 behavior of the library, making it possible for the library to be used in a
|
jpayne@68
|
40 very flexible fashion while leveraging the common code required to parse,
|
jpayne@68
|
41 represent, and generate message-like objects. For example, in addition to the
|
jpayne@68
|
42 default :rfc:`5322` email message policy, we also have a policy that manages
|
jpayne@68
|
43 HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy
|
jpayne@68
|
44 controls, such as the maximum line length produced by the generator, can also
|
jpayne@68
|
45 be controlled individually to meet specialized application requirements.
|
jpayne@68
|
46
|
jpayne@68
|
47
|
jpayne@68
|
48 The Model
|
jpayne@68
|
49 ---------
|
jpayne@68
|
50
|
jpayne@68
|
51 The message model is implemented by the :class:`~email.message.Message` class.
|
jpayne@68
|
52 The model divides a message into the two fundamental parts discussed by the
|
jpayne@68
|
53 RFC: the header section and the body. The `Message` object acts as a
|
jpayne@68
|
54 pseudo-dictionary of named headers. Its dictionary interface provides
|
jpayne@68
|
55 convenient access to individual headers by name. However, all headers are kept
|
jpayne@68
|
56 internally in an ordered list, so that the information about the order of the
|
jpayne@68
|
57 headers in the original message is preserved.
|
jpayne@68
|
58
|
jpayne@68
|
59 The `Message` object also has a `payload` that holds the body. A `payload` can
|
jpayne@68
|
60 be one of two things: data, or a list of `Message` objects. The latter is used
|
jpayne@68
|
61 to represent a multipart MIME message. Lists can be nested arbitrarily deeply
|
jpayne@68
|
62 in order to represent the message, with all terminal leaves having non-list
|
jpayne@68
|
63 data payloads.
|
jpayne@68
|
64
|
jpayne@68
|
65
|
jpayne@68
|
66 Message Lifecycle
|
jpayne@68
|
67 -----------------
|
jpayne@68
|
68
|
jpayne@68
|
69 The general lifecycle of a message is:
|
jpayne@68
|
70
|
jpayne@68
|
71 Creation
|
jpayne@68
|
72 A `Message` object can be created by a Parser, or it can be
|
jpayne@68
|
73 instantiated as an empty message by an application.
|
jpayne@68
|
74
|
jpayne@68
|
75 Manipulation
|
jpayne@68
|
76 The application may examine one or more headers, and/or the
|
jpayne@68
|
77 payload, and it may modify one or more headers and/or
|
jpayne@68
|
78 the payload. This may be done on the top level `Message`
|
jpayne@68
|
79 object, or on any sub-object.
|
jpayne@68
|
80
|
jpayne@68
|
81 Finalization
|
jpayne@68
|
82 The Model is converted into a unicode or binary stream,
|
jpayne@68
|
83 or the model is discarded.
|
jpayne@68
|
84
|
jpayne@68
|
85
|
jpayne@68
|
86
|
jpayne@68
|
87 Header Policy Control During Lifecycle
|
jpayne@68
|
88 --------------------------------------
|
jpayne@68
|
89
|
jpayne@68
|
90 One of the major controls exerted by the Policy is the management of headers
|
jpayne@68
|
91 during the `Message` lifecycle. Most applications don't need to be aware of
|
jpayne@68
|
92 this.
|
jpayne@68
|
93
|
jpayne@68
|
94 A header enters the model in one of two ways: via a Parser, or by being set to
|
jpayne@68
|
95 a specific value by an application program after the Model already exists.
|
jpayne@68
|
96 Similarly, a header exits the model in one of two ways: by being serialized by
|
jpayne@68
|
97 a Generator, or by being retrieved from a Model by an application program. The
|
jpayne@68
|
98 Policy object provides hooks for all four of these pathways.
|
jpayne@68
|
99
|
jpayne@68
|
100 The model storage for headers is a list of (name, value) tuples.
|
jpayne@68
|
101
|
jpayne@68
|
102 The Parser identifies headers during parsing, and passes them to the
|
jpayne@68
|
103 :meth:`~email.policy.Policy.header_source_parse` method of the Policy. The
|
jpayne@68
|
104 result of that method is the (name, value) tuple to be stored in the model.
|
jpayne@68
|
105
|
jpayne@68
|
106 When an application program supplies a header value (for example, through the
|
jpayne@68
|
107 `Message` object `__setitem__` interface), the name and the value are passed to
|
jpayne@68
|
108 the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which
|
jpayne@68
|
109 returns the (name, value) tuple to be stored in the model.
|
jpayne@68
|
110
|
jpayne@68
|
111 When an application program retrieves a header (through any of the dict or list
|
jpayne@68
|
112 interfaces of `Message`), the name and value are passed to the
|
jpayne@68
|
113 :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to
|
jpayne@68
|
114 obtain the value returned to the application.
|
jpayne@68
|
115
|
jpayne@68
|
116 When a Generator requests a header during serialization, the name and value are
|
jpayne@68
|
117 passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which
|
jpayne@68
|
118 returns a string containing line breaks in the appropriate places. The
|
jpayne@68
|
119 :meth:`~email.policy.Policy.cte_type` Policy control determines whether or
|
jpayne@68
|
120 not Content Transfer Encoding is performed on the data in the header. There is
|
jpayne@68
|
121 also a :meth:`~email.policy.Policy.binary_fold` method for use by generators
|
jpayne@68
|
122 that produce binary output, which returns the folded header as binary data,
|
jpayne@68
|
123 possibly folded at different places than the corresponding string would be.
|
jpayne@68
|
124
|
jpayne@68
|
125
|
jpayne@68
|
126 Handling Binary Data
|
jpayne@68
|
127 --------------------
|
jpayne@68
|
128
|
jpayne@68
|
129 In an ideal world all message data would conform to the RFCs, meaning that the
|
jpayne@68
|
130 parser could decode the message into the idealized unicode message that the
|
jpayne@68
|
131 sender originally wrote. In the real world, the email package must also be
|
jpayne@68
|
132 able to deal with badly formatted messages, including messages containing
|
jpayne@68
|
133 non-ASCII characters that either have no indicated character set or are not
|
jpayne@68
|
134 valid characters in the indicated character set.
|
jpayne@68
|
135
|
jpayne@68
|
136 Since email messages are *primarily* text data, and operations on message data
|
jpayne@68
|
137 are primarily text operations (except for binary payloads of course), the model
|
jpayne@68
|
138 stores all text data as unicode strings. Un-decodable binary inside text
|
jpayne@68
|
139 data is handled by using the `surrogateescape` error handler of the ASCII
|
jpayne@68
|
140 codec. As with the binary filenames the error handler was introduced to
|
jpayne@68
|
141 handle, this allows the email package to "carry" the binary data received
|
jpayne@68
|
142 during parsing along until the output stage, at which time it is regenerated
|
jpayne@68
|
143 in its original form.
|
jpayne@68
|
144
|
jpayne@68
|
145 This carried binary data is almost entirely an implementation detail. The one
|
jpayne@68
|
146 place where it is visible in the API is in the "internal" API. A Parser must
|
jpayne@68
|
147 do the `surrogateescape` encoding of binary input data, and pass that data to
|
jpayne@68
|
148 the appropriate Policy method. The "internal" interface used by the Generator
|
jpayne@68
|
149 to access header values preserves the `surrogateescaped` bytes. All other
|
jpayne@68
|
150 interfaces convert the binary data either back into bytes or into a safe form
|
jpayne@68
|
151 (losing information in some cases).
|
jpayne@68
|
152
|
jpayne@68
|
153
|
jpayne@68
|
154 Backward Compatibility
|
jpayne@68
|
155 ----------------------
|
jpayne@68
|
156
|
jpayne@68
|
157 The :class:`~email.policy.Policy.Compat32` Policy provides backward
|
jpayne@68
|
158 compatibility with version 5.1 of the email package. It does this via the
|
jpayne@68
|
159 following implementation of the four+1 Policy methods described above:
|
jpayne@68
|
160
|
jpayne@68
|
161 header_source_parse
|
jpayne@68
|
162 Splits the first line on the colon to obtain the name, discards any spaces
|
jpayne@68
|
163 after the colon, and joins the remainder of the line with all of the
|
jpayne@68
|
164 remaining lines, preserving the linesep characters to obtain the value.
|
jpayne@68
|
165 Trailing carriage return and/or linefeed characters are stripped from the
|
jpayne@68
|
166 resulting value string.
|
jpayne@68
|
167
|
jpayne@68
|
168 header_store_parse
|
jpayne@68
|
169 Returns the name and value exactly as received from the application.
|
jpayne@68
|
170
|
jpayne@68
|
171 header_fetch_parse
|
jpayne@68
|
172 If the value contains any `surrogateescaped` binary data, return the value
|
jpayne@68
|
173 as a :class:`~email.header.Header` object, using the character set
|
jpayne@68
|
174 `unknown-8bit`. Otherwise just returns the value.
|
jpayne@68
|
175
|
jpayne@68
|
176 fold
|
jpayne@68
|
177 Uses :class:`~email.header.Header`'s folding to fold headers in the
|
jpayne@68
|
178 same way the email5.1 generator did.
|
jpayne@68
|
179
|
jpayne@68
|
180 binary_fold
|
jpayne@68
|
181 Same as fold, but encodes to 'ascii'.
|
jpayne@68
|
182
|
jpayne@68
|
183
|
jpayne@68
|
184 New Algorithm
|
jpayne@68
|
185 -------------
|
jpayne@68
|
186
|
jpayne@68
|
187 header_source_parse
|
jpayne@68
|
188 Same as legacy behavior.
|
jpayne@68
|
189
|
jpayne@68
|
190 header_store_parse
|
jpayne@68
|
191 Same as legacy behavior.
|
jpayne@68
|
192
|
jpayne@68
|
193 header_fetch_parse
|
jpayne@68
|
194 If the value is already a header object, returns it. Otherwise, parses the
|
jpayne@68
|
195 value using the new parser, and returns the resulting object as the value.
|
jpayne@68
|
196 `surrogateescaped` bytes get turned into unicode unknown character code
|
jpayne@68
|
197 points.
|
jpayne@68
|
198
|
jpayne@68
|
199 fold
|
jpayne@68
|
200 Uses the new header folding algorithm, respecting the policy settings.
|
jpayne@68
|
201 surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for
|
jpayne@68
|
202 ``cte_type=7bit`` or ``8bit``. Returns a string.
|
jpayne@68
|
203
|
jpayne@68
|
204 At some point there will also be a ``cte_type=unicode``, and for that
|
jpayne@68
|
205 policy fold will serialize the idealized unicode message with RFC-like
|
jpayne@68
|
206 folding, converting any surrogateescaped bytes into the unicode
|
jpayne@68
|
207 unknown character glyph.
|
jpayne@68
|
208
|
jpayne@68
|
209 binary_fold
|
jpayne@68
|
210 Uses the new header folding algorithm, respecting the policy settings.
|
jpayne@68
|
211 surrogateescaped bytes are encoded using the `unknown-8bit` charset for
|
jpayne@68
|
212 ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``.
|
jpayne@68
|
213 Returns bytes.
|
jpayne@68
|
214
|
jpayne@68
|
215 At some point there will also be a ``cte_type=unicode``, and for that
|
jpayne@68
|
216 policy binary_fold will serialize the message according to :rfc:``5335``.
|