jpayne@68: :mod:`email` Package Architecture jpayne@68: ================================= jpayne@68: jpayne@68: Overview jpayne@68: -------- jpayne@68: jpayne@68: The email package consists of three major components: jpayne@68: jpayne@68: Model jpayne@68: An object structure that represents an email message, and provides an jpayne@68: API for creating, querying, and modifying a message. jpayne@68: jpayne@68: Parser jpayne@68: Takes a sequence of characters or bytes and produces a model of the jpayne@68: email message represented by those characters or bytes. jpayne@68: jpayne@68: Generator jpayne@68: Takes a model and turns it into a sequence of characters or bytes. The jpayne@68: sequence can either be intended for human consumption (a printable jpayne@68: unicode string) or bytes suitable for transmission over the wire. In jpayne@68: the latter case all data is properly encoded using the content transfer jpayne@68: encodings specified by the relevant RFCs. jpayne@68: jpayne@68: Conceptually the package is organized around the model. The model provides both jpayne@68: "external" APIs intended for use by application programs using the library, jpayne@68: and "internal" APIs intended for use by the Parser and Generator components. jpayne@68: This division is intentionally a bit fuzzy; the API described by this jpayne@68: documentation is all a public, stable API. This allows for an application jpayne@68: with special needs to implement its own parser and/or generator. jpayne@68: jpayne@68: In addition to the three major functional components, there is a third key jpayne@68: component to the architecture: jpayne@68: jpayne@68: Policy jpayne@68: An object that specifies various behavioral settings and carries jpayne@68: implementations of various behavior-controlling methods. jpayne@68: jpayne@68: The Policy framework provides a simple and convenient way to control the jpayne@68: behavior of the library, making it possible for the library to be used in a jpayne@68: very flexible fashion while leveraging the common code required to parse, jpayne@68: represent, and generate message-like objects. For example, in addition to the jpayne@68: default :rfc:`5322` email message policy, we also have a policy that manages jpayne@68: HTTP headers in a fashion compliant with :rfc:`2616`. Individual policy jpayne@68: controls, such as the maximum line length produced by the generator, can also jpayne@68: be controlled individually to meet specialized application requirements. jpayne@68: jpayne@68: jpayne@68: The Model jpayne@68: --------- jpayne@68: jpayne@68: The message model is implemented by the :class:`~email.message.Message` class. jpayne@68: The model divides a message into the two fundamental parts discussed by the jpayne@68: RFC: the header section and the body. The `Message` object acts as a jpayne@68: pseudo-dictionary of named headers. Its dictionary interface provides jpayne@68: convenient access to individual headers by name. However, all headers are kept jpayne@68: internally in an ordered list, so that the information about the order of the jpayne@68: headers in the original message is preserved. jpayne@68: jpayne@68: The `Message` object also has a `payload` that holds the body. A `payload` can jpayne@68: be one of two things: data, or a list of `Message` objects. The latter is used jpayne@68: to represent a multipart MIME message. Lists can be nested arbitrarily deeply jpayne@68: in order to represent the message, with all terminal leaves having non-list jpayne@68: data payloads. jpayne@68: jpayne@68: jpayne@68: Message Lifecycle jpayne@68: ----------------- jpayne@68: jpayne@68: The general lifecycle of a message is: jpayne@68: jpayne@68: Creation jpayne@68: A `Message` object can be created by a Parser, or it can be jpayne@68: instantiated as an empty message by an application. jpayne@68: jpayne@68: Manipulation jpayne@68: The application may examine one or more headers, and/or the jpayne@68: payload, and it may modify one or more headers and/or jpayne@68: the payload. This may be done on the top level `Message` jpayne@68: object, or on any sub-object. jpayne@68: jpayne@68: Finalization jpayne@68: The Model is converted into a unicode or binary stream, jpayne@68: or the model is discarded. jpayne@68: jpayne@68: jpayne@68: jpayne@68: Header Policy Control During Lifecycle jpayne@68: -------------------------------------- jpayne@68: jpayne@68: One of the major controls exerted by the Policy is the management of headers jpayne@68: during the `Message` lifecycle. Most applications don't need to be aware of jpayne@68: this. jpayne@68: jpayne@68: A header enters the model in one of two ways: via a Parser, or by being set to jpayne@68: a specific value by an application program after the Model already exists. jpayne@68: Similarly, a header exits the model in one of two ways: by being serialized by jpayne@68: a Generator, or by being retrieved from a Model by an application program. The jpayne@68: Policy object provides hooks for all four of these pathways. jpayne@68: jpayne@68: The model storage for headers is a list of (name, value) tuples. jpayne@68: jpayne@68: The Parser identifies headers during parsing, and passes them to the jpayne@68: :meth:`~email.policy.Policy.header_source_parse` method of the Policy. The jpayne@68: result of that method is the (name, value) tuple to be stored in the model. jpayne@68: jpayne@68: When an application program supplies a header value (for example, through the jpayne@68: `Message` object `__setitem__` interface), the name and the value are passed to jpayne@68: the :meth:`~email.policy.Policy.header_store_parse` method of the Policy, which jpayne@68: returns the (name, value) tuple to be stored in the model. jpayne@68: jpayne@68: When an application program retrieves a header (through any of the dict or list jpayne@68: interfaces of `Message`), the name and value are passed to the jpayne@68: :meth:`~email.policy.Policy.header_fetch_parse` method of the Policy to jpayne@68: obtain the value returned to the application. jpayne@68: jpayne@68: When a Generator requests a header during serialization, the name and value are jpayne@68: passed to the :meth:`~email.policy.Policy.fold` method of the Policy, which jpayne@68: returns a string containing line breaks in the appropriate places. The jpayne@68: :meth:`~email.policy.Policy.cte_type` Policy control determines whether or jpayne@68: not Content Transfer Encoding is performed on the data in the header. There is jpayne@68: also a :meth:`~email.policy.Policy.binary_fold` method for use by generators jpayne@68: that produce binary output, which returns the folded header as binary data, jpayne@68: possibly folded at different places than the corresponding string would be. jpayne@68: jpayne@68: jpayne@68: Handling Binary Data jpayne@68: -------------------- jpayne@68: jpayne@68: In an ideal world all message data would conform to the RFCs, meaning that the jpayne@68: parser could decode the message into the idealized unicode message that the jpayne@68: sender originally wrote. In the real world, the email package must also be jpayne@68: able to deal with badly formatted messages, including messages containing jpayne@68: non-ASCII characters that either have no indicated character set or are not jpayne@68: valid characters in the indicated character set. jpayne@68: jpayne@68: Since email messages are *primarily* text data, and operations on message data jpayne@68: are primarily text operations (except for binary payloads of course), the model jpayne@68: stores all text data as unicode strings. Un-decodable binary inside text jpayne@68: data is handled by using the `surrogateescape` error handler of the ASCII jpayne@68: codec. As with the binary filenames the error handler was introduced to jpayne@68: handle, this allows the email package to "carry" the binary data received jpayne@68: during parsing along until the output stage, at which time it is regenerated jpayne@68: in its original form. jpayne@68: jpayne@68: This carried binary data is almost entirely an implementation detail. The one jpayne@68: place where it is visible in the API is in the "internal" API. A Parser must jpayne@68: do the `surrogateescape` encoding of binary input data, and pass that data to jpayne@68: the appropriate Policy method. The "internal" interface used by the Generator jpayne@68: to access header values preserves the `surrogateescaped` bytes. All other jpayne@68: interfaces convert the binary data either back into bytes or into a safe form jpayne@68: (losing information in some cases). jpayne@68: jpayne@68: jpayne@68: Backward Compatibility jpayne@68: ---------------------- jpayne@68: jpayne@68: The :class:`~email.policy.Policy.Compat32` Policy provides backward jpayne@68: compatibility with version 5.1 of the email package. It does this via the jpayne@68: following implementation of the four+1 Policy methods described above: jpayne@68: jpayne@68: header_source_parse jpayne@68: Splits the first line on the colon to obtain the name, discards any spaces jpayne@68: after the colon, and joins the remainder of the line with all of the jpayne@68: remaining lines, preserving the linesep characters to obtain the value. jpayne@68: Trailing carriage return and/or linefeed characters are stripped from the jpayne@68: resulting value string. jpayne@68: jpayne@68: header_store_parse jpayne@68: Returns the name and value exactly as received from the application. jpayne@68: jpayne@68: header_fetch_parse jpayne@68: If the value contains any `surrogateescaped` binary data, return the value jpayne@68: as a :class:`~email.header.Header` object, using the character set jpayne@68: `unknown-8bit`. Otherwise just returns the value. jpayne@68: jpayne@68: fold jpayne@68: Uses :class:`~email.header.Header`'s folding to fold headers in the jpayne@68: same way the email5.1 generator did. jpayne@68: jpayne@68: binary_fold jpayne@68: Same as fold, but encodes to 'ascii'. jpayne@68: jpayne@68: jpayne@68: New Algorithm jpayne@68: ------------- jpayne@68: jpayne@68: header_source_parse jpayne@68: Same as legacy behavior. jpayne@68: jpayne@68: header_store_parse jpayne@68: Same as legacy behavior. jpayne@68: jpayne@68: header_fetch_parse jpayne@68: If the value is already a header object, returns it. Otherwise, parses the jpayne@68: value using the new parser, and returns the resulting object as the value. jpayne@68: `surrogateescaped` bytes get turned into unicode unknown character code jpayne@68: points. jpayne@68: jpayne@68: fold jpayne@68: Uses the new header folding algorithm, respecting the policy settings. jpayne@68: surrogateescaped bytes are encoded using the ``unknown-8bit`` charset for jpayne@68: ``cte_type=7bit`` or ``8bit``. Returns a string. jpayne@68: jpayne@68: At some point there will also be a ``cte_type=unicode``, and for that jpayne@68: policy fold will serialize the idealized unicode message with RFC-like jpayne@68: folding, converting any surrogateescaped bytes into the unicode jpayne@68: unknown character glyph. jpayne@68: jpayne@68: binary_fold jpayne@68: Uses the new header folding algorithm, respecting the policy settings. jpayne@68: surrogateescaped bytes are encoded using the `unknown-8bit` charset for jpayne@68: ``cte_type=7bit``, and get turned back into bytes for ``cte_type=8bit``. jpayne@68: Returns bytes. jpayne@68: jpayne@68: At some point there will also be a ``cte_type=unicode``, and for that jpayne@68: policy binary_fold will serialize the message according to :rfc:``5335``.