Does urllib2.urlopen() actually fetch the page?

Posted by beagleguy on Stack Overflow See other posts from Stack Overflow or by beagleguy
Published on 2010-06-09T19:22:26Z Indexed on 2010/06/09 20:02 UTC
Read the original article Hit count: 235

Filed under:
|

hi all, I was condering when I use urllib2.urlopen() does it just to header reads or does it actually bring back the entire webpage?

IE does the HTML page actually get fetch on the urlopen call or the read() call?

handle = urllib2.urlopen(url)
html = handle.read()

The reason I ask is for this workflow...

  • I have a list of urls (some of them with short url services)
  • I only want to read the webpage if I haven't seen that url before
  • I need to call urlopen() and use geturl() to get the final page that link goes to (after the 302 redirects) so I know if I've crawled it yet or not.
  • I don't want to incur the overhead of having to grab the html if I've already parsed that page.

thanks!

© Stack Overflow or respective owner

Related posts about python

Related posts about urllib2