extract payload from tcpflow output

Posted by Felipe Alvarez on Stack Overflow See other posts from Stack Overflow or by Felipe Alvarez
Published on 2010-05-19T15:20:55Z Indexed on 2010/05/20 4:10 UTC
Read the original article Hit count: 424

Tcpflow outputs a bunch of files, many of which are HTTP responses from a web server. Inside, they contain HTTP headers, including Content-type: , and other important ones. I'm trying to write a script that can extract just the payload data (i.e. image/jpeg; text/html; et al.) and save it to a file [optional: with an appropriate name and file extension].

The EOL chars are \r\n (CRLF) and so this makes it difficult to use in GNU distros (in my experiences).

I've been trying something along the lines of:

sed /HTTP/,/^$/d  

To delete all text from the the beginning of HTTP (incl) to the end of \r\n\r\n (incl) but I have found no luck. I'm looking for help from anyone with good experience in sed and/or awk. I have zero experience with Perl, please I'd prefer to use common GNU command line utilities for this

Find a sample tcpflow output file here.

Thanks,
Felipe

© Stack Overflow or respective owner

Related posts about shell

Related posts about http-header-fields