Cleaning mixed type <script> tags

Posted by yossale on Stack Overflow See other posts from Stack Overflow or by yossale
Published on 2010-04-26T12:39:16Z Indexed on 2010/04/26 12:43 UTC
Read the original article Hit count: 212

Filed under:
|
|

I'm cleaning HTML using cyberneko and xerces. However , some $#@@!@@ websites still use BOTH

<script>...</script> and <script.../> 

So what happens is this : given

<script..../> <div> Some Text </div> <script> scripting stuff </script> , 

neko parses all the above line as a script , so I get

<script..../> &lt div &gt Some Text &lt/div &gt &lt script &gt scripting stuff </script> , 

And then I lose all the inside content :(

Any advice?

© Stack Overflow or respective owner

Related posts about jave

Related posts about html-sanitizing