How do I modify this download function in Python?

Posted by TIMEX on Stack Overflow See other posts from Stack Overflow or by TIMEX
Published on 2011-01-13T02:17:13Z Indexed on 2011/01/13 2:54 UTC
Read the original article Hit count: 330

Filed under:
|
|
|

Right now, it's iffy. Gzip, images, sometimes it doesn't work.

How do I modify this download function so that it can work with anything? (Regardless of gzip or any header?)

How do I automatically "Detect" if it's gzip? I don't want to always pass True/False, like I do right now.

def download(source_url, g = False, correct_url = True):
    try:
        socket.setdefaulttimeout(10)
        agents = ['Mozilla/4.0 (compatible; MSIE 5.5; Windows NT 5.0)','Mozilla/4.0 (compatible; MSIE 7.0b; Windows NT 5.1)','Microsoft Internet Explorer/4.0b1 (Windows 95)','Opera/8.00 (Windows NT 5.1; U; en)']
        ree = urllib2.Request(source_url)
        ree.add_header('User-Agent',random.choice(agents))
        ree.add_header('Accept-encoding', 'gzip')
        opener = urllib2.build_opener()
        h = opener.open(ree).read()
        if g:
            compressedstream = StringIO(h)
            gzipper = gzip.GzipFile(fileobj=compressedstream)
            data = gzipper.read()
            return data
        else:
            return h
    except Exception, e:
        return ""

© Stack Overflow or respective owner

Related posts about python

Related posts about http