PHP - Processing Invalid XML

Posted by Paul on Stack Overflow See other posts from Stack Overflow or by Paul
Published on 2010-05-22T23:16:30Z Indexed on 2010/05/22 23:20 UTC
Read the original article Hit count: 360

Filed under:
|
|
|

I'm using SimpleXML to load in some xml files (which I didn't write/provide and can't really change the format of).

Occasionally (eg one or two files out of every 50 or so) they don't escape any special characters (mostly &, but sometimes other random invalid things too). This creates and issue because SimpleXML with php just fails, and I don't really know of any good way to handle parsing invalid XML.

My first idea was to preprocess the XML as a string and put ALL fields in as CDATA so it would work, but for some ungodly reason the XML I need to process puts all of its data in the attribute fields. Thus I can't use the CDATA idea. An example of the XML being:

 <Author v="By Someone & Someone" />

Whats the best way to process this to replace all the invalid characters from the XML before I load it in with SimpleXML?

© Stack Overflow or respective owner

Related posts about php

Related posts about Xml