annotate CSP2/CSP2_env/env-d9b9114564458d9d-741b3de822f2aaca6c6caa4325c4afce/lib/python3.8/email/architecture.rst @ 68:5028fdace37b

planemo upload commit 2e9511a184a1ca667c7be0c6321a36dc4e3d116d
author jpayne
date Tue, 18 Mar 2025 16:23:26 -0400
parents
children
rev   line source
jpayne@68 1 :mod:`email` Package Architecture
jpayne@68 2 =================================
jpayne@68 3
jpayne@68 4 Overview
jpayne@68 5 --------
jpayne@68 6
jpayne@68 7 The email package consists of three major components:
jpayne@68 8
jpayne@68 9 Model
jpayne@68 10 An object structure that represents an email message, and provides an
jpayne@68 11 API for creating, querying, and modifying a message.
jpayne@68 12
jpayne@68 13 Parser
jpayne@68 14 Takes a sequence of characters or bytes and produces a model of the
jpayne@68 15 email message represented by those characters or bytes.
jpayne@68 16
jpayne@68 17 Generator
jpayne@68 18 Takes a model and turns it into a sequence of characters or bytes. The
jpayne@68 19 sequence can either be intended for human consumption (a printable
jpayne@68 20 unicode string) or bytes suitable for transmission over the wire. In
jpayne@68 21 the latter case all data is properly encoded using the content transfer
jpayne@68 22 encodings specified by the relevant RFCs.
jpayne@68 23
jpayne@68 24 Conceptually the package is organized around the model. The model provides both
jpayne@68 25 "external" APIs intended for use by application programs using the library,
jpayne@68 26 and "internal" APIs intended for use by the Parser and Generator components.
jpayne@68 27 This division is intentionally a bit fuzzy; the API described by this
jpayne@68 28 documentation is all a public, stable API. This allows for an application
jpayne@68 29 with special needs to implement its own parser and/or generator.
jpayne@68 30
jpayne@68 31 In addition to the three major functional components, there is a third key
jpayne@68 32 component to the architecture:
jpayne@68 33
jpayne@68 34 Policy
jpayne@68 35 An object that specifies various behavioral settings and carries
jpayne@68 36 implementations of various behavior-controlling methods.
jpayne@68 37
jpayne@68 38 The Policy framework provides a simple and convenient way to control the
jpayne@68 39 behavior of the library, making it possible for the library to be used in a
jpayne@68 40 very flexible fashion while leveraging the common code required to parse,
jpayne@68 41 represent, and generate message-like objects. For example, in addition to the
jpayne@68 42 default :rfc:`5322` email message policy, we also have a policy that manages
jpayne@68 43 HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy
jpayne@68 44 controls, such as the maximum line length produced by the generator, can also
jpayne@68 45 be controlled individually to meet specialized application requirements.
jpayne@68 46
jpayne@68 47
jpayne@68 48 The Model
jpayne@68 49 ---------
jpayne@68 50
jpayne@68 51 The message model is implemented by the :class:`~email.message.Message` class.
jpayne@68 52 The model divides a message into the two fundamental parts discussed by the
jpayne@68 53 RFC: the header section and the body. The `Message` object acts as a
jpayne@68 54 pseudo-dictionary of named headers. Its dictionary interface provides
jpayne@68 55 convenient access to individual headers by name. However, all headers are kept
jpayne@68 56 internally in an ordered list, so that the information about the order of the
jpayne@68 57 headers in the original message is preserved.
jpayne@68 58
jpayne@68 59 The `Message` object also has a `payload` that holds the body. A `payload` can
jpayne@68 60 be one of two things: data, or a list of `Message` objects. The latter is used
jpayne@68 61 to represent a multipart MIME message. Lists can be nested arbitrarily deeply
jpayne@68 62 in order to represent the message, with all terminal leaves having non-list
jpayne@68 63 data payloads.
jpayne@68 64
jpayne@68 65
jpayne@68 66 Message Lifecycle
jpayne@68 67 -----------------
jpayne@68 68
jpayne@68 69 The general lifecycle of a message is:
jpayne@68 70
jpayne@68 71 Creation
jpayne@68 72 A `Message` object can be created by a Parser, or it can be
jpayne@68 73 instantiated as an empty message by an application.
jpayne@68 74
jpayne@68 75 Manipulation
jpayne@68 76 The application may examine one or more headers, and/or the
jpayne@68 77 payload, and it may modify one or more headers and/or
jpayne@68 78 the payload. This may be done on the top level `Message`
jpayne@68 79 object, or on any sub-object.
jpayne@68 80
jpayne@68 81 Finalization
jpayne@68 82 The Model is converted into a unicode or binary stream,
jpayne@68 83 or the model is discarded.
jpayne@68 84
jpayne@68 85
jpayne@68 86
jpayne@68 87 Header Policy Control During Lifecycle
jpayne@68 88 --------------------------------------
jpayne@68 89
jpayne@68 90 One of the major controls exerted by the Policy is the management of headers
jpayne@68 91 during the `Message` lifecycle. Most applications don't need to be aware of
jpayne@68 92 this.
jpayne@68 93
jpayne@68 94 A header enters the model in one of two ways: via a Parser, or by being set to
jpayne@68 95 a specific value by an application program after the Model already exists.
jpayne@68 96 Similarly, a header exits the model in one of two ways: by being serialized by
jpayne@68 97 a Generator, or by being retrieved from a Model by an application program. The
jpayne@68 98 Policy object provides hooks for all four of these pathways.
jpayne@68 99
jpayne@68 100 The model storage for headers is a list of (name, value) tuples.
jpayne@68 101
jpayne@68 102 The Parser identifies headers during parsing, and passes them to the
jpayne@68 103 :meth:`~email.policy.Policy.header_source_parse` method of the Policy. The
jpayne@68 104 result of that method is the (name, value) tuple to be stored in the model.
jpayne@68 105
jpayne@68 106 When an application program supplies a header value (for example, through the
jpayne@68 107 `Message` object `__setitem__` interface), the name and the value are passed to
jpayne@68 108 the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which
jpayne@68 109 returns the (name, value) tuple to be stored in the model.
jpayne@68 110
jpayne@68 111 When an application program retrieves a header (through any of the dict or list
jpayne@68 112 interfaces of `Message`), the name and value are passed to the
jpayne@68 113 :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to
jpayne@68 114 obtain the value returned to the application.
jpayne@68 115
jpayne@68 116 When a Generator requests a header during serialization, the name and value are
jpayne@68 117 passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which
jpayne@68 118 returns a string containing line breaks in the appropriate places. The
jpayne@68 119 :meth:`~email.policy.Policy.cte_type` Policy control determines whether or
jpayne@68 120 not Content Transfer Encoding is performed on the data in the header. There is
jpayne@68 121 also a :meth:`~email.policy.Policy.binary_fold` method for use by generators
jpayne@68 122 that produce binary output, which returns the folded header as binary data,
jpayne@68 123 possibly folded at different places than the corresponding string would be.
jpayne@68 124
jpayne@68 125
jpayne@68 126 Handling Binary Data
jpayne@68 127 --------------------
jpayne@68 128
jpayne@68 129 In an ideal world all message data would conform to the RFCs, meaning that the
jpayne@68 130 parser could decode the message into the idealized unicode message that the
jpayne@68 131 sender originally wrote. In the real world, the email package must also be
jpayne@68 132 able to deal with badly formatted messages, including messages containing
jpayne@68 133 non-ASCII characters that either have no indicated character set or are not
jpayne@68 134 valid characters in the indicated character set.
jpayne@68 135
jpayne@68 136 Since email messages are *primarily* text data, and operations on message data
jpayne@68 137 are primarily text operations (except for binary payloads of course), the model
jpayne@68 138 stores all text data as unicode strings. Un-decodable binary inside text
jpayne@68 139 data is handled by using the `surrogateescape` error handler of the ASCII
jpayne@68 140 codec. As with the binary filenames the error handler was introduced to
jpayne@68 141 handle, this allows the email package to "carry" the binary data received
jpayne@68 142 during parsing along until the output stage, at which time it is regenerated
jpayne@68 143 in its original form.
jpayne@68 144
jpayne@68 145 This carried binary data is almost entirely an implementation detail. The one
jpayne@68 146 place where it is visible in the API is in the "internal" API. A Parser must
jpayne@68 147 do the `surrogateescape` encoding of binary input data, and pass that data to
jpayne@68 148 the appropriate Policy method. The "internal" interface used by the Generator
jpayne@68 149 to access header values preserves the `surrogateescaped` bytes. All other
jpayne@68 150 interfaces convert the binary data either back into bytes or into a safe form
jpayne@68 151 (losing information in some cases).
jpayne@68 152
jpayne@68 153
jpayne@68 154 Backward Compatibility
jpayne@68 155 ----------------------
jpayne@68 156
jpayne@68 157 The :class:`~email.policy.Policy.Compat32` Policy provides backward
jpayne@68 158 compatibility with version 5.1 of the email package. It does this via the
jpayne@68 159 following implementation of the four+1 Policy methods described above:
jpayne@68 160
jpayne@68 161 header_source_parse
jpayne@68 162 Splits the first line on the colon to obtain the name, discards any spaces
jpayne@68 163 after the colon, and joins the remainder of the line with all of the
jpayne@68 164 remaining lines, preserving the linesep characters to obtain the value.
jpayne@68 165 Trailing carriage return and/or linefeed characters are stripped from the
jpayne@68 166 resulting value string.
jpayne@68 167
jpayne@68 168 header_store_parse
jpayne@68 169 Returns the name and value exactly as received from the application.
jpayne@68 170
jpayne@68 171 header_fetch_parse
jpayne@68 172 If the value contains any `surrogateescaped` binary data, return the value
jpayne@68 173 as a :class:`~email.header.Header` object, using the character set
jpayne@68 174 `unknown-8bit`. Otherwise just returns the value.
jpayne@68 175
jpayne@68 176 fold
jpayne@68 177 Uses :class:`~email.header.Header`'s folding to fold headers in the
jpayne@68 178 same way the email5.1 generator did.
jpayne@68 179
jpayne@68 180 binary_fold
jpayne@68 181 Same as fold, but encodes to 'ascii'.
jpayne@68 182
jpayne@68 183
jpayne@68 184 New Algorithm
jpayne@68 185 -------------
jpayne@68 186
jpayne@68 187 header_source_parse
jpayne@68 188 Same as legacy behavior.
jpayne@68 189
jpayne@68 190 header_store_parse
jpayne@68 191 Same as legacy behavior.
jpayne@68 192
jpayne@68 193 header_fetch_parse
jpayne@68 194 If the value is already a header object, returns it. Otherwise, parses the
jpayne@68 195 value using the new parser, and returns the resulting object as the value.
jpayne@68 196 `surrogateescaped` bytes get turned into unicode unknown character code
jpayne@68 197 points.
jpayne@68 198
jpayne@68 199 fold
jpayne@68 200 Uses the new header folding algorithm, respecting the policy settings.
jpayne@68 201 surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for
jpayne@68 202 ``cte_type=7bit`` or ``8bit``. Returns a string.
jpayne@68 203
jpayne@68 204 At some point there will also be a ``cte_type=unicode``, and for that
jpayne@68 205 policy fold will serialize the idealized unicode message with RFC-like
jpayne@68 206 folding, converting any surrogateescaped bytes into the unicode
jpayne@68 207 unknown character glyph.
jpayne@68 208
jpayne@68 209 binary_fold
jpayne@68 210 Uses the new header folding algorithm, respecting the policy settings.
jpayne@68 211 surrogateescaped bytes are encoded using the `unknown-8bit` charset for
jpayne@68 212 ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``.
jpayne@68 213 Returns bytes.
jpayne@68 214
jpayne@68 215 At some point there will also be a ``cte_type=unicode``, and for that
jpayne@68 216 policy binary_fold will serialize the message according to :rfc:``5335``.