Parsing every part of an HTTP header field-value
        Posted  
        
            by brickner
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by brickner
        
        
        
        Published on 2010-06-13T05:40:46Z
        Indexed on 
            2010/06/13
            5:52 UTC
        
        
        Read the original article
        Hit count: 374
        
Hi all.
I'm parsing HTTP data directly from packets (either TCP reconstructed or not, you can assume it is).
I'm looking for the best way to parse HTTP as accurately as possible.
The main issue here is the HTTP header.
Looking at the basic RFC of HTTP/1.1, it seems that HTTP header parsing would be complex. The RFC describes very complex regular expressions for different parts of the header.
Should I write these regular expressions to parse the different parts of the HTTP header?
The basic parsing I've written so far for HTTP header is for the generic HTTP header:
message-header = field-name ":" [ field-value ]
And I've included replacing inner LWS with SP and repeating headers with the same field-name with comma separated values as described in section 4.2.
However, looking at section 14.9 for example would show that in order to parse the different parts of the field-value I need a much more complex parsing scheme.
How do you suggest I should handle the complex parts of HTTP parsing (specifically the field-value) assuming I want to give the parser users the full capabilities of HTTP and to parse every part of HTTP?
Design suggestions for this would also be appreciated.
Thanks.
© Stack Overflow or respective owner