Search and highlight html - ignoring and maintaining tags

Posted by Sleepwalker on Stack Overflow See other posts from Stack Overflow or by Sleepwalker
Published on 2011-01-04T04:43:09Z Indexed on 2011/01/04 4:53 UTC
Read the original article Hit count: 283

Filed under:
|
|
|
|

I am looking for a good way to highlight key words in a block of html with stripping the html tags. I can regex to search for key words within html tags, but I haven't found a great way to search across tags. For example, if the key word phrase is "not bound" I want to be able to make this

<p>I am not<strong>bound to please thee</strong> with my answers.</p>

become wrapped in highlight tags, without breaking the "strong" tag (and making the html invalid) and become:

<p>I am <span class="highlight">not</span><strong><span class="highlight">bound</span>  to please thee</strong> with my answers.</p>  

The main issue is maintaining the html as it is AND wrapping blocks of text with highlight tags. I need to maintain the original html. Otherwise I would strip the tags.

The best solution to this that I can think of right now would entail making a copy of the html and placing counter tokens where each space occurs, then stripping all tags and search for matching phrases, then looking back to the original and the tokenized strings and figuring out where to start building the highlight tags, then start walking forward, starting and ending highlight spans as needed from the beginning of the match until the end. This seems like overkill. I would like to something more elegant if possible.

The solution would be written in C# or perhaps javascript, depending.

© Stack Overflow or respective owner

Related posts about html

Related posts about regex