Tricky issue with using xslt with badly formed html...
        Posted  
        
            by Ryba
        on Stack Overflow
        
        See other posts from Stack Overflow
        
            or by Ryba
        
        
        
        Published on 2010-05-21T22:30:49Z
        Indexed on 
            2010/05/21
            22:50 UTC
        
        
        Read the original article
        Hit count: 178
        
Hi there, I am fairly new to xslt (2.0) and am having some trouble with a tricky issue. Essentially I have a badly formatted html file like below:
<html>
<body>
<p> text 1 </p>
<div> <p> text 2</p> </div>
<p> Here is a list
    <ul>
        <ol> 
            <li> ListItem1 </li>
        <li> ListItem1 </li>
    </ol>
    <dl>
        <li> dl item </li>
        <li> dl item2 </li>
    </dl>
</ul> 
<div>
<p> I was here</p>
</div>
</p>
And I am trying to put it into a nicely formated XML file. In my xslt file I recursively check if all children of a p or div are other p's or div's and just promote them, other wise I use them as stand alone paragraphs. I extended this idea so that if a p or div with a child list show up properly but don't promote the list children.
A problem that I am having is that the output xml I get is the following
<?xml version="1.0" encoding="utf-8"?><html>
<body>
<p> text 1 </p>
 <p> text 2</p> 
 Here is a list
<ul>
    <ol> 
        <li> ListItem1 </li>
        <li> ListItem1 </li>
    </ol>
    <dl>
        <li> dl item </li>
        <li> dl item2 </li>
    </dl>
</ul> 
<p> I was here</p>
"Here is a list" needs to be in paragraph tags too! I am going crazy trying to solve this ... Any input/links would be greatly appreciated.
© Stack Overflow or respective owner