How do I read UTF-8 characters via a pointer?

Posted by Jen on Stack Overflow See other posts from Stack Overflow or by Jen
Published on 2010-06-01T08:41:28Z Indexed on 2010/06/01 8:43 UTC
Read the original article Hit count: 165

Filed under:
|
|
|

Suppose I have UTF-8 content stored in memory, how do I read the characters using a pointer? I presume I need to watch for the 8th bit indicating a multi-byte character, but how exactly do I turn the sequence into a valid Unicode character? Also, is wchar_t the proper type to store a single Unicode character?

This is what I have in mind:


   wchar_t readNextChar (char** p)
   { 
       char ch = *p++;
       if (ch & 128)
       {
           // This is a multi-byte character, what do I do now?
           // char chNext = *p++; 
           // ... but how do I assemble the Unicode character?   
           ...
       }
       ...
   }  

© Stack Overflow or respective owner

Related posts about c++

Related posts about unicode