Are UTF16 (as used by for example wide-winapi functions) characters always 2 byte long?

Posted by Cray on Stack Overflow See other posts from Stack Overflow or by Cray
Published on 2011-01-10T23:03:29Z Indexed on 2011/01/10 23:53 UTC
Read the original article Hit count: 220

Filed under:
|
|
|

Please clarify for me, how does UTF16 work? I am a little confused, considering these points:

  • There is a static type in C++, WCHAR, which is 2 bytes long. (always 2 bytes long obvisouly)
  • Most of msdn and some other documentation seem to have the assumptions that the characters are always 2 bytes long. This can just be my imagination, I can't come up with any particular examples, but it just seems that way.
  • There are no "extra wide" functions or characters types widely used in C++ or windows, so I would assume that UTF16 is all that is ever needed.
  • To my uncertain knowledge, unicode has a lot more characters than 65535, so they obvisouly don't have enough space in 2 bytes.
  • UTF16 seems to be a bigger version of UTF8, and UTF8 characters can be of different lengths.

So if a UTF16 character not always 2 bytes long, how long else could it be? 3 bytes? or only multiples of 2? And then for example if there is a winapi function that wants to know the size of a wide string in characters, and the string contains 2 characters which are each 4 bytes long, how is the size of that string in characters calculated?

Is it 2 chars long or 4 chars long? (since it is 8 bytes long, and each WCHAR is 2 bytes)

© Stack Overflow or respective owner

Related posts about c++

Related posts about unicode