Regular expression to match empty HTML tags that may contain embedded JSTL?

Posted by Keith Bentrup on Stack Overflow See other posts from Stack Overflow or by Keith Bentrup
Published on 2009-11-10T05:10:47Z Indexed on 2010/04/01 22:23 UTC
Read the original article Hit count: 295

Filed under:
|
|
|
|

I'm trying to construct a regular expression to look for empty html tags that may have embedded JSTL. I'm using Perl for my matching.

So far I can match any empty html tag that does not contain JSTL with the following?

/<\w+\b(?!:)[^<]*?>\s*<\/\w+/si

The \b(?!:) will avoid matching an opening JTSL tag but that doesn't address the whether JSTL may be within the HTML tag itself (which is allowable). I only want to know if this HTML tag has no children (only whitespace or empty). So I'm looking for a pattern that would match both the following:

<div id="my-id"> 
</div>
<div class="<c:out var="${my.property}" />"></div>

Currently the first div matches. The second does not. Is it doable? I tried several variations using lookahead assertions, and I'm starting to think it's not. However, I can't say for certain or articulate why it's not.

Edit: I'm not writing something to interpret the code, and I'm not interested in using a parser. I'm writing a script to point out potential issues/oversights. And at this point, I'm curious, too, to see if there is something clever with lookaheads or lookbehinds that I may be missing. If it bothers you that I'm trying to "solve" a problem this way, don't think of it as looking for a solution. To me it's more of a challenge now, and an opportunity to learn more about regular expressions.

Also, if it helps, you can assume that the html is xhtml strict.

© Stack Overflow or respective owner

Related posts about regex

Related posts about perl