How can I check if a binary string is UTF-8 in mysql?

Posted by Piotr Czapla on Stack Overflow See other posts from Stack Overflow or by Piotr Czapla
Published on 2010-02-04T13:05:46Z Indexed on 2010/03/14 18:25 UTC
Read the original article Hit count: 240

Filed under:
|
|

I've found a Perl regexp that can check if a string is UTF-8 (the regexp is from w3c site).

$field =~
  m/\A(
     [\x09\x0A\x0D\x20-\x7E]            # ASCII
   | [\xC2-\xDF][\x80-\xBF]             # non-overlong 2-byte
   |  \xE0[\xA0-\xBF][\x80-\xBF]        # excluding overlongs
   | [\xE1-\xEC\xEE\xEF][\x80-\xBF]{2}  # straight 3-byte
   |  \xED[\x80-\x9F][\x80-\xBF]        # excluding surrogates
   |  \xF0[\x90-\xBF][\x80-\xBF]{2}     # planes 1-3
   | [\xF1-\xF3][\x80-\xBF]{3}          # planes 4-15
   |  \xF4[\x80-\x8F][\x80-\xBF]{2}     # plane 16
  )*\z/x;

But I'm not sure how to port it to MySQL as it seems that MySQL don't support hex representation of characters see this question.

Any thoughts how to port the regexp to MySQL? Or maybe you know any other way to check if the string is valid UTF-8?

UPDATE: I need this check working on the MySQL as I need to run it on the server to correct broken tables. I can't pass the data through a script as the database is around 1TB.

© Stack Overflow or respective owner

Related posts about mysql

Related posts about regex