Why is Swing Parser's handleText not handling nested tags?
- by Jim P
I need to transform some HTML text that has nested tags to decorate 'matches' with a css attribute to highlight it (like firefox search).
I can't just do a simple replace (think if user searched for "img" for example), so I'm trying to just do the replace within the body text (not on tag attributes).
I have a pretty straightforward HTML parser that I think should do this:
final Pattern pat = Pattern.compile(srch, Pattern.CASE_INSENSITIVE);
Matcher m = pat.matcher(output);
if (m.find()) {
    final StringBuffer ret = new StringBuffer(output.length()+100);
    lastPos=0;
    try {
        new ParserDelegator().parse(new StringReader(output.toString()),
        new HTMLEditorKit.ParserCallback () {
            public void handleText(char[] data, int pos) {
                ret.append(output.subSequence(lastPos, pos));
                Matcher m = pat.matcher(new String(data));
                ret.append(m.replaceAll("<span class=\"search\">$0</span>"));
                lastPos=pos+data.length;
            }
        }, false);
        ret.append(output.subSequence(lastPos, output.length()));
        return ret;
    } catch (Exception e) {
 return output;
    }
}
return output;
My problem is, when I debug this, the handleText is getting called with text that includes tags!  It's like it's only going one level deep.  Anyone know why?  Is there some simple thing I need to do to HTMLParser (haven't used it much) to enable 'proper' behavior of nested tags?
PS - I figured it out myself - see answer below.  Short answer is, it works fine if you pass it HTML, not pre-escaped HTML.  Doh!  Hope this helps someone else.
<span>example with <a href="#">nested</a> <p>more nesting</p>
</span> <!-- all this gets thrown together -->