Pros and Cons of Java HTML to XML cleaners

Posted by cjavapro on Stack Overflow See other posts from Stack Overflow or by cjavapro
Published on 2010-12-21T16:44:47Z Indexed on 2010/12/21 16:54 UTC
Read the original article Hit count: 281

Filed under:
|
|
|

I am looking to allow HTML emails (and other HTML uploads) without letting in scripts and stuff. I plan to have a white list of safe tags and attributes as well as a whitelist of CSS tags and value regexes (to prevent automatic return receipt).

I asked a question: Parse a badly formatted XML document (like an HTML file)

I found there are many many ways to do this. Some systems have built in sanitizers (which I don't care so much about).

I will post some answers and say Community Wiki. Please post any other options you like and say Community Wiki so they can be voted on. Also any comments or wiki edits on what part of a certain product is better and what is not would be greatly appreciated.

This page is a very nice listing page but I get kinda lost http://java-source.net/open-source/html-parsers

© Stack Overflow or respective owner

Related posts about java

Related posts about security