Regular expression works normally, but fails when placed in an XML schema

Posted by Eli Courtwright on Stack Overflow See other posts from Stack Overflow or by Eli Courtwright
Published on 2010-05-10T20:54:55Z Indexed on 2010/05/10 21:24 UTC
Read the original article Hit count: 374

Filed under:
|
|
|
|

I have a simple doc.xml file which contains a single root element with a Timestamp attribute:

<?xml version="1.0" encoding="utf-8"?>
<root Timestamp="04-21-2010 16:00:19.000" />

I'd like to validate this document against a my simple schema.xsd to make sure that the Timestamp is in the correct format:

<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="root">
    <xs:complexType>
      <xs:attribute name="Timestamp" use="required" type="timeStampType"/>
    </xs:complexType>
  </xs:element>
  <xs:simpleType name="timeStampType">
    <xs:restriction base="xs:string">
      <xs:pattern value="(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}" />
    </xs:restriction>
  </xs:simpleType>
</xs:schema>

So I use the lxml Python module and try to perform a simple schema validation and report any errors:

from lxml import etree

schema = etree.XMLSchema( etree.parse("schema.xsd") )
doc = etree.parse("doc.xml")

if not schema.validate(doc):
    for e in schema.error_log:
        print e.message

My XML document fails validation with the following error messages:

Element 'root', attribute 'Timestamp': [facet 'pattern'] The value '04-21-2010 16:00:19.000' is not accepted by the pattern '(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}'.
Element 'root', attribute 'Timestamp': '04-21-2010 16:00:19.000' is not a valid value of the atomic type 'timeStampType'.

So it looks like my regular expression must be faulty. But when I try to validate the regular expression at the command line, it passes:

>>> import re
>>> pat = '(0[0-9]{1})|(1[0-2]{1})-(3[0-1]{1}|[0-2]{1}[0-9]{1})-[2-9]{1}[0-9]{3} ([0-1]{1}[0-9]{1}|2[0-3]{1}):[0-5]{1}[0-9]{1}:[0-5]{1}[0-9]{1}.[0-9]{3}'
>>> assert re.match(pat, '04-21-2010 16:00:19.000')
>>> 

I'm aware that XSD regular expressions don't have every feature, but the documentation I've found indicates that every feature that I'm using should work.

So what am I mis-understanding, and why does my document fail?

© Stack Overflow or respective owner

Related posts about python

Related posts about schema