Detect Unicode Usage in SQL Column

Posted on Dot net Slackers See other posts from Dot net Slackers
Published on Fri, 30 Apr 2010 00:00:00 GMT Indexed on 2010/04/30 16:59 UTC
Read the original article Hit count: 292

Filed under:

One optimization you can make to a SQL table that is overly large is to change from nvarchar (or nchar) to varchar (or char).  Doing so will cut the size used by the data in half, from 2 bytes per character (+ 2 bytes of overhead for varchar) to only 1 byte per character.  However, you will lose the ability to store Unicode characters, such as those used by many non-English alphabets.  If the tables are storing user-input, and your application is or might one day be used internationally, its likely that using Unicode for your characters is a good thing.  However, if instead the data is being generated by your application itself or your development team (such as lookup data), and you can be certain that Unicode character sets are not required, then switching such columns to varchar/char can be an easy improvement to make.

Avoid Premature Optimization

If you are working with a lookup table that has a small number of rows, and is only ever referenced in the application by its numeric ID column, then you wont see any benefit to using varchar vs. nvarchar.  More generally, for small tables, you wont see any significant benefit.  Thus, if you have a general policy in place to use nvarchar/nchar because it offers more flexibility, do not take this post as a recommendation to go against this policy anywhere you can.  You really only want to act on measurable evidence that suggests that using Unicode is resulting in a problem, and that you wont lose anything by switching to varchar/char.

Obviously the main reason to make this change is to reduce the amount of space required by each row.  This in turn affects how many rows SQL Server can page through at a time, and can also impact index size and how much disk I/O is required to respond to queries, etc.  If for example you have a table with 100 million records in it and this table has a column of type nchar(5), this column will use 5 * 2 = 10 bytes per row, and with 100M rows that works out to 10 bytes * 100 million = 1000 MBytes or 1GB.  If it turns out that this column only ever stores ASCII characters, then changing it to char(5) would reduce this to 5*1 = 5 bytes per row, and only 500MB.  Of course, if it turns out that it only ever stores the values true and false then you could go further and replace it with a bit data type which uses only 1 byte per row (100MB  total).

Detecting Whether Unicode Is In Use

So by now you think that you have a problem and that it might be alleviated by switching some columns from nvarchar/nchar to varchar/char but youre not sure whether youre currently using Unicode in these columns.  By definition, you should only be thinking about this for a column that has a lot of rows in it, since the benefits just arent there for a small table, so you cant just eyeball it and look for any non-ASCII characters.  Instead, you need a query.  Its actually very simple:

SELECT DISTINCT(CategoryName)
FROM Categories
WHERE CategoryName <> CONVERT(varchar, CategoryName)
Summary Gregg Stark for the tip.


Did you know that DotNetSlackers also publishes .net articles written by top known .net Authors? We already have over 80 articles in several categories including Silverlight. Take a look: here.



Email this Article

© Dot net Slackers or respective owner