Replacing characters in a non well-formed XML body

Posted by ryanprayogo on Stack Overflow See other posts from Stack Overflow or by ryanprayogo
Published on 2010-06-09T18:13:29Z Indexed on 2010/06/09 18:22 UTC
Read the original article Hit count: 176

Filed under:
|
|

In a (Java) code that I'm working on, I sometimes deal with a non well-formed XML (represented as a Java String), such as:

<root>
  <foo>
    bar & baz < quux
  </foo>
</root>

Since this XML will eventually need to be unmarshalled (using JAXB), obviously this XML as is will throw exception upon unmarshalling.

What's the best way to replace the & and the < to its character entities? For &, it's as easy as:

xml.replaceAll("&", "&amp;")

However, for the < symbol, it's a bit tricky since obviously I don't want to replace the < that's used for the XML tag opening 'bracket'.

Other than scanning the string and manually replacing < in the XML body with &lt;, what other option can you suggest?

© Stack Overflow or respective owner

Related posts about java

Related posts about Xml