Regex to add CDATA for mal formed XML

Posted by AntonioCS on Stack Overflow See other posts from Stack Overflow or by AntonioCS
Published on 2010-06-01T17:03:01Z Indexed on 2010/06/01 17:13 UTC
Read the original article Hit count: 301

Filed under:
|
|

Hey guys!

I have this huge xml file (13 mb) and it has some malformed values. Here is a sample of the xml:

<propertylist>
        <adprop index="0" proptype="type" value="Ft"/>
        <adprop index="0" proptype="category" value="Bs"/>
        <adprop index="0" proptype="subcategory" value="Bsm"/>
        <adprop index="0" proptype="description" value="MOONEN CUSTOM 58"/> 
</propertylist>

Now this is ok. But I many other nodes that are not encapsulated in CDATA that need to be. The node that gives me problems is the

<adprop index="0" proptype="description" value=""/> 

I created this regular expression:

<adprop index="0" proptype="description" value="(.+)"\/>

to catch that node and replace it with this:

<adprop index="0" proptype="description" value="<![CDATA[\1]]>"\/>

I run this in notepad++ and it works.

The only problem is when the value="" is multi lined like:

  <adprop index="0" proptype="description" value="cutter that has demonstrated her offshore capabiliti from there to the Canaries with her current owner. 

Spacious homely interior with over 2m headroom and heaps of" />

It fails with this one, and there are plenty like this one.

Can anyone help me out in the regular expression so that I can catch the value when it's multi lined?

Thanks

© Stack Overflow or respective owner

Related posts about Windows

Related posts about Xml