Python: find <title>

Posted by Peter on Stack Overflow See other posts from Stack Overflow or by Peter
Published on 2010-05-20T10:02:15Z Indexed on 2010/05/20 10:30 UTC
Read the original article Hit count: 220

Filed under:

I have this:

response = urllib2.urlopen(url)
html     = response.read()

begin = html.find('<title>')
end   = html.find('</title>',begin)
title = html[begin+len('<title>'):end].strip()

if the url = http://www.google.com then the title have no problem as "Google",

but if the url = "http://www.britishcouncil.org/learning-english-gateway" then the title become

"<!doctype html public "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<base href="http://www.britishcouncil.org/" />
<META http-equiv="Content-Type" Content="text/html;charset=utf-8">
<meta name="WT.sp" content="Learning;Home Page Smart View" />
<meta name="WT.cg_n" content="Learn English Gateway" />
<META NAME="DCS.dcsuri" CONTENT="/learning-english-gateway.htm">..."

What is actually happening, why I couldn't return the "title"?

© Stack Overflow or respective owner

Related posts about python