Java HTML parser/validator

Posted by at on Stack Overflow See other posts from Stack Overflow or by at
Published on 2010-12-24T01:40:28Z Indexed on 2010/12/24 1:53 UTC
Read the original article Hit count: 531

We allow people to enter HTML code on our wiki-like site. But only a limited subset of HTML to not affect our styling and not allow malicious javascript code. Is there a good Java library on the server side to ensure that the code entered is valid?

We tried creating an XML Schema document to validate against. The only issue there is the libraries we used to validate gave back cryptic error messages. What I want is for the validation library to actually fix the issue (if there was a style="" attribute added to an element, remove it). If fixing it is not easy, at least allow me to report a message to the user with the location of the error (an error code that I can present a nice message from is fine, probably even preferable).

© Stack Overflow or respective owner

Related posts about Xml

Related posts about xml-schema