Replace HTML entities in a string avoiding <img> tags

Posted by Xeos on Stack Overflow See other posts from Stack Overflow or by Xeos
Published on 2013-10-24T21:51:39Z Indexed on 2013/10/24 21:53 UTC
Read the original article Hit count: 249

Filed under:
|
|
|

I have the following input:

Hi! How are you? <script>//NOT EVIL!</script>

Wassup? :P

LOOOL!!! :D :D :D

Which is then run through emoticon library and it become this:

Hi! How are you? <script>//NOT EVIL!</script>

Wassup? <img class="smiley" alt="" title="tongue, :P" src="ui/emoticons/15.gif">

LOOOL!!! <img class="smiley" alt="" title="big grin, :D" src="ui/emoticons/5.gif"> <img class="smiley" alt="" title="big grin, :P" src="ui/emoticons/5.gif"> <img class="smiley" alt="" title="big grin, :P" src="ui/emoticons/5.gif">

I have a function that escapes HTML entites to prevent XSS. So running it on raw input for the first line would produce:

Hi! How are you? &lt;script&gt;//NOT EVIL!&lt;/script&gt;

Now I need to escape all the input, but at the same time I need to preserve emoticons in their initial state. So when there is <:-P emoticon, it stays like that and does not become &lt;:-P.

I was thinking of running a regex split on the emotified text. Then processing each part on its own and then concatenating the string together, but I am not sure how easily can Regex be bypassed? I know the format will always be this:

[<img class="smiley" alt="]
[empty string]
[" title="]
[one of the values from a big list]
[, ]
[another value from the list (may be matching original emoticon)]
[" src="ui/emoticons/]
[integer from Y to X]
[.gif">]

Using the list MAY be slow, since I need to run that regex on text that may have 20-30-40 emoticons. Plus there may be 5-10-15 text messages to process. What could be an elegant solution to this? I am ready to use third-party library or jQuery for this. PHP preprocessing is possible as well.

© Stack Overflow or respective owner

Related posts about JavaScript

Related posts about jQuery