How do you match only valid roman numerals with a regular expression?

Posted by Daniel Magliola on Stack Overflow See other posts from Stack Overflow or by Daniel Magliola
Published on 2008-11-06T01:29:25Z Indexed on 2010/05/12 2:24 UTC
Read the original article Hit count: 356

Filed under:
|

Thinking about my other problem, i decided I can't even create a regular expression that will match roman numerals (let alone a context-free grammar that will generate them)

The problem is matching only valid roman numerals. Eg, 990 is NOT "XM", it's "CMXC"

My problem in making the regex for this is that in order to allow or not allow certain characters, I need to look back. Let's take thousands and hundreds, for example.

I can allow M{0,2}C?M (to allow for 900, 1000, 1900, 2000, 2900 and 3000). However, If the match is on CM, I can't allow following characters to be C or D (because I'm already at 900).

How can I express this in a regex?
If it's simply not expressible in a regex, is it expressible in a context-free grammar?

Thanks for any pointers!

© Stack Overflow or respective owner

Related posts about exercise

Related posts about regex