Parsing XHTML with inline tags
        Posted  
        
            by user290796
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by user290796
        
        
        
        Published on 2010-04-16T15:19:14Z
        Indexed on 
            2010/04/16
            15:23 UTC
        
        
        Read the original article
        Hit count: 301
        
Hi,
I'm trying to parse an XHTML document using TBXML on the iPhone (although I would be happy to use either libxml2 or NSXMLParser if it would be easier). I need to extract the content of the body as a series of paragraphs and maintain the inline tags, for example:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.1//EN" "http://www.w3.org/TR/xhtml11/DTD/xhtml11.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en">
   <head>
      <title>Title</title>
      <link rel="stylesheet" href="css/style.css" type="text/css"/>
      <meta http-equiv="Content-Type" content="application/xhtml+xml; charset=utf-8"/>
   </head>
   <body>
      <div class="body">
         <div>
            <h3>Title</h3>
            <p>Paragraph with <em>inline</em> tags</p>
            <img src="image.png" />
         </div>
      </div>
   </body>
</html>
I need to extract the paragraph but maintain the <em>inline</em> content with the paragraph, all my testing so far has extracted that as a subelement without me knowing exactly where it fitted in the paragraph.
Can anyone suggest a way to do this?
Thanks.
© Stack Overflow or respective owner