Regex, encoding, and characters that look a like

Posted by hack.augusto on Stack Overflow See other posts from Stack Overflow or by hack.augusto
Published on 2010-03-24T16:54:31Z Indexed on 2010/03/25 3:43 UTC
Read the original article Hit count: 543

Filed under:
|
|
|

First, a brief example, let's say I have this "/[0-9]{2}°/" regex and this text "24º". The text won't match, obviusly ... (?) really, it depends on the character encoding.

Here is my problem, I do not have control on which chars the user uses, so, I need to cover all possibilities in the regex /[0-9]{2}[°º]/, or even better, assure that the text has only the chars I'm expecting °. But I can't just remove the unknow chars otherwise the regex won't work, I need to change it to the chars that looks like it and I'm expecting. I have done this through a little function that maps the "look like" to "what I expect" and change it, the problem is, I have not covered all possibilities, for example, today I found a new "-", now we got three of them, just like latex =D - -- --- ,cool , but the regex didn't work.

Does anyone knows how I might solve this?

© Stack Overflow or respective owner

Related posts about regex

Related posts about encoding