Hello,
I'm trying to implement a basic MIME parser for
the multipart/related in C++/Qt.
So far I've been writing some basic parser code for headers, and I'm reading
the RFCs to get an idea how to do everything as close to
the specification as possible. Unfortunately there is a part in
the RFC that confuses me a bit:
From RFC882 Section 3.1.1:
Each header field can be viewed as a single, logical line of
ASCII characters, comprising a field-name and a field-body.
For convenience,
the field-body portion of this conceptual
entity can be split into a multiple-line representation; this
is called "folding".
The general rule is that wherever there
may be linear-white-space (NOT simply LWSP-chars), a CRLF
immediately followed by AT LEAST one LWSP-char may instead be
inserted. Thus,
the single line
Alright, so I simply parse a header field and if a CRLF follows with linear whitespace, I simply concat those in a useful manner to result in a single header line. Let's proceed...
From RFC2045 Section 5.1:
In
the Augmented BNF notation of RFC 822, a Content-Type header field
value is defined as follows:
content := "Content-Type" ":" type "/" subtype
*(";" parameter)
; Matching of media type and subtype
; is ALWAYS case-insensitive.
[...]
parameter := attribute "=" value
attribute := token
; Matching of attributes
; is ALWAYS case-insensitive.
value := token / quoted-string
token := 1*<any (US-ASCII) CHAR except SPACE, CTLs,
or tspecials>
Okay. So it seems if you want to specify a Content-Type header with parameters, simple do it like this:
Content-Type: multipart/related; foo=bar; something=else
... and a folded version of
the same header would look like this:
Content-Type: multipart/related;
foo=bar;
something=else
Correct? Good. As I kept reading
the RFCs, I came across
the following in RFC2387 Section 5.1 (Examples):
Content-Type: Multipart/Related; boundary=example-1
start="<
[email protected]>";
type="Application/X-FixedRecord"
start-info="-o ps"
--example-1
Content-Type: Application/X-FixedRecord
Content-ID: <
[email protected]>
[data]
--example-1
Content-Type: Application/octet-stream
Content-Description:
The fixed length records
Content-Transfer-Encoding: base64
Content-ID: <
[email protected]>
[data]
--example-1--
Hmm, this is odd. Do you see
the Content-Type header? It has a number of parameters, but not
all have a ";" as parameter delimiter.
Maybe I just didn't read
the RFCs correctly, but if my parser works strictly like
the specification defines,
the type and start-info parameters would result in a single string or worse, a parser error.
Guys, what's your thought on this? Just a typo in
the RFCs? Or did I miss something?
Thanks!