Are UTF16 (as used by for example wide-winapi functions) characters always 2 byte long?

Posted by Cray on Stack Overflow See other posts from Stack Overflow or by Cray
Published on 2011-01-10T23:03:29Z Indexed on 2011/01/10 23:53 UTC
Read the original article Hit count: 361

Filed under:

c++

|

unicode

|

utf-8

|

utf-16

Please clarify for me, how does UTF16 work? I am a little confused, considering these points:

There is a static type in C++, WCHAR, which is 2 bytes long. (always 2 bytes long obvisouly)
Most of msdn and some other documentation seem to have the assumptions that the characters are always 2 bytes long. This can just be my imagination, I can't come up with any particular examples, but it just seems that way.
There are no "extra wide" functions or characters types widely used in C++ or windows, so I would assume that UTF16 is all that is ever needed.
To my uncertain knowledge, unicode has a lot more characters than 65535, so they obvisouly don't have enough space in 2 bytes.
UTF16 seems to be a bigger version of UTF8, and UTF8 characters can be of different lengths.

So if a UTF16 character not always 2 bytes long, how long else could it be? 3 bytes? or only multiples of 2? And then for example if there is a winapi function that wants to know the size of a wide string in characters, and the string contains 2 characters which are each 4 bytes long, how is the size of that string in characters calculated?

Is it 2 chars long or 4 chars long? (since it is 8 bytes long, and each WCHAR is 2 bytes)

© Stack Overflow or respective owner

Related posts about c++

C++ : C++ Primer (Stanley Lipmann) or The C++ programming language (special edition)

as seen on Stack Overflow - Search for 'Stack Overflow'
I have a Computer Science degree (long2 time ago) .. I do know Java OOP but i am now trying to pick up C++. I do have C and of course data structure using C or pascal. I have started reading Bjarne Stroustrup book (The C++ Programming Language - Special Edition) but find it extremely difficult esp… >>> More
Which C++ book shold I get between "C++ Primer" vs "C++ Primer Plus"

as seen on Stack Overflow - Search for 'Stack Overflow'
I want to learn C++ by using Vim and MinGW as compiler. I'm interesting at "C++ Primer (4th Edition)" and "C++ Primer Plus (5th Edition)" but I don't know how about it different. It has no book store that I can review those books, so I want to know, what is the different between those book and which… >>> More
Managed c++ std::string not accessible in unmanaged c++

as seen on Stack Overflow - Search for 'Stack Overflow'
In unmanaged c++ dll i have a function which takes constant std::string as argument Prototype : void read ( const std::string &imageSpec_ ) I call this function from managed c++ dll by passing a std::string. When i debug the unmanaged c++ code the parameter imageSpec_ shows the value correctly… >>> More
I need help on my C++ assignment using MS Visual C++

as seen on Stack Overflow - Search for 'Stack Overflow'
Ok, so I don't want you to do my homework for me, but I'm a little lost with this final assignment and need all the help I can get. Learning about programming is tough enough, but doing it online is next to impossible for me... Now, to get to the program, I am going to paste what I have so far. This… >>> More
The Definitive C++ Book Guide and List

as seen on Stack Overflow - Search for 'Stack Overflow'
After more than a few questions about deciding on C++ books I thought we could make a better community wiki version. Providing QUALITY books and an approximate skill level. Maybe we can add a short blurb/description about each book that you have personally read / benefited from. Feel free to debate… >>> More

Related posts about unicode

Translating Between Unicode and Non-Unicode Character Sets in Java

as seen on Internet.com - Search for 'Internet.com'
You can use Java APIs not only to help translate characters, strings, and text streams to other languages, but also to convert Unicode character sets to non-Unicode and vice versa. >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
SQLite, python, unicode, and non-utf data

as seen on Stack Overflow - Search for 'Stack Overflow'
I started by trying to store strings in sqlite using python, and got the message: sqlite3.ProgrammingError: You must not use 8-bit bytestrings unless you use a text_factory that can interpret 8-bit bytestrings (like text_factory = str). It is highly recommended that you instead just… >>> More
notepad sql Unicode and Non Unicode

as seen on Super User - Search for 'Super User'
Hi, I have a Microsoft Notepad flate file with data and Vertical Bar as column delimiter. I get following message: cannot convert between unicode and non-unicode string data types It seems it is my nvarchar(max) that creates my problem. I changed to varchar(max); but still the same problem. How… >>> More
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U

as seen on Super User - Search for 'Super User'
On Windows 7, dir or tree can't show unicode characters, even starting cmd with cmd /U So I would press Window Key + R to run something, and type in cmd /U so that the content might handle Unicode. And then using dir or tree /F, the content in Unicode won't show as Unicode. (in Window Explorer… >>> More