Java remove HTML from String without regular expressions

Posted by behrk2 on Stack Overflow See other posts from Stack Overflow or by behrk2
Published on 2010-03-21T22:17:10Z Indexed on 2010/03/21 22:21 UTC
Read the original article Hit count: 336

Filed under:
|
|

Hello,

I am trying to remove all HTML elements from a String. Unfortunately, I cannot use regular expressions because I am developing on the Blackberry platform and regular expressions are not yet supported.

Is there any other way that I can remove HTML from a string? I read somewhere that you can use a DOM Parser, but I couldn't find much on it.

Text with HTML:

<![CDATA[As a massive asteroid hurtles toward Earth, NASA head honcho Dan Truman (<a href="http://www.netflix.com/RoleDisplay/Billy_Bob_Thornton/20000303">Billy Bob Thornton</a>) hatches a plan to split the deadly rock in two before it annihilates the entire planet, calling on Harry Stamper (<a href="http://www.netflix.com/RoleDisplay/Bruce_Willis/99786">Bruce Willis</a>) -- the world's finest oil driller -- to head up the mission. With time rapidly running out, Stamper assembles a crack team and blasts off into space to attempt the treacherous task. <a href="http://www.netflix.com/RoleDisplay/Ben_Affleck/20000016">Ben Affleck</a> and <a href="http://www.netflix.com/RoleDisplay/Liv_Tyler/162745">Liv Tyler</a> co-star.]]>

Text without HTML:

As a massive asteroid hurtles toward Earth, NASA head honcho Dan Truman (Billy Bob Thornton) hatches a plan to split the deadly rock in two before it annihilates the entire planet, calling on Harry Stamper (Bruce Willis) -- the world's finest oil driller -- to head up the mission. With time rapidly running out, Stamper assembles a crack team and blasts off into space to attempt the treacherous task.Ben Affleck and Liv Tyler co-star.

Thanks!

© Stack Overflow or respective owner

Related posts about java

Related posts about parse