SGML Parser in Python

Posted by afg102 on Stack Overflow See other posts from Stack Overflow or by afg102
Published on 2011-01-08T08:50:13Z Indexed on 2011/01/08 8:53 UTC
Read the original article Hit count: 133

Filed under:
|
|

I am completely new to Python. I have the following code:

class ExtractTitle(sgmllib.SGMLParser):

def __init__(self, verbose=0):

   sgmllib.SGMLParser.__init__(self, verbose)

   self.title = self.data = None

def handle_data(self, data):

  if self.data is not None:
    self.data.append(data)

def start_title(self, attrs):
 self.data = []

def end_title(self):

  self.title = string.join(self.data, "")

raise FoundTitle # abort parsing!

which extracts the title element from SGML, however it only works for a single title. I know I have to overload the unknown_starttag and unknown_endtag in order to get all titles but I keep getting it wrong. Help me please!!!

© Stack Overflow or respective owner

Related posts about python

Related posts about parsing