I need to remove Java Script tags using regular expressions and JRegex

Posted by piotr on Stack Overflow See other posts from Stack Overflow or by piotr
Published on 2010-06-15T10:26:10Z Indexed on 2010/06/15 10:32 UTC
Read the original article Hit count: 290

Filed under:
|

I need to remove all the Java Script tags and the content in between and style tags from the HTML code of web pages.So far I've come up with this expression : "(<[ \r\n\t]script([ \r\n\t>]|>){1,}([ \r\n\t]|.)?)|(<[ \r\n\t]noscript([ \r\n\t>]|>){1,}([ \r\n\t]|.)?)|(<[ \r\n\t]style([ \r\n\t>]|>){1,}([ \r\n\t]|.)?)"

I use JRegex library to work with regular expressions. When I test it in any regex tester it works just fine, but once I run my program - it all crashes down with this error report:

Exception in thread "Thread-0" java.lang.StackOverflowError at java.util.regex.Pattern$BranchConn.match(Unknown Source) at java.util.regex.Pattern$BmpCharProperty.match(Unknown Source) at java.util.regex.Pattern$Branch.match(Unknown Source) at java.util.regex.Pattern$GroupHead.match(Unknown Source) at java.util.regex.Pattern$LazyLoop.match(Unknown Source) at java.util.regex.Pattern$GroupTail.match(Unknown Source) at java.util.regex.Pattern$BranchConn.match(Unknown Source) at java.util.regex.Pattern$CharProperty.match(Unknown Source) at java.util.regex.Pattern$Branch.match(Unknown Source) at java.util.regex.Pattern$GroupHead.match(Unknown Source) at java.util.regex.Pattern$LazyLoop.match(Unknown Source) .................................. And it keeps on going forever. If anyone can give me an advice on this one - I'll be very grateful.

© Stack Overflow or respective owner

Related posts about java

Related posts about regex