Sniffing out SQL Code Smells: Inconsistent use of Symbolic names and Datatypes
- by Phil Factor
It is an awkward feeling. You’ve just delivered a database application that 
seems to be working fine in production, and you just run a few checks on it. You discover that there is a potential bug 
that, out of sheer good chance, hasn’t kicked in to produce an error; but it lurks, like a smoking bomb. Worse, maybe 
you find that the bug has started its evil work of corrupting the data, but in ways that nobody has, so far detected. 
You investigate, and find the damage. You are somehow going to have to repair it. 
Yes, it still very occasionally happens to me. It is not a nice feeling, and I 
do anything I can to prevent it happening. That’s why I’m interested in SQL code smells. SQL Code Smells aren’t 
necessarily bad practices, but just show you where to focus your attention when checking an application. 
Sometimes with databases the bugs can be subtle. SQL is rather like HTML: the 
language does its best to try to carry out your wishes, rather than to be picky about your bugs. Most of the time, this 
is a great benefit, but not always. One particular place where this can be detrimental is where you have implicit conversion between different data types. Most of the time it is completely harmless but we’re 
 concerned about the occasional time it isn’t. Let’s give an example: String truncation. Let’s give another even 
more frightening one, rounding errors on assignment to a number of different precision. Each requires a blog-post to 
explain in detail and I’m not now going to try. Just remember that it is not always a good idea to assign data to 
variables, parameters or even columns when they aren’t the same datatype, especially if you are relying on implicit 
conversion to work its magic.For details of the problem and the consequences, see here:  SR0014: Data loss might occur when casting from {Type1} to {Type2} . For any experienced Database Developer, this is a more frightening read than a Vampire Story.
This is why one of the SQL Code Smells that makes me edgy, in my own or other 
peoples’ code, is to see parameters, variables and columns that have the same names and different datatypes. Whereas 
quite a lot of this is perfectly normal and natural, you need to check in case one of two things have gone wrong. Either 
sloppy naming, or mixed datatypes. Sure it is hard to remember whether you decided that the length of a log entry was 80 
or 100 characters long, or the precision of a number. That is why a little check like this I’m going to show you is 
excellent for tidying up your code before you check it back into source Control! 
1/ Checking Parameters only
If you were just going to check parameters, you might just do this. It simply 
groups all the parameters, either input or output, of all the routines (e.g. stored procedures or functions) by their 
name and checks to see, in the HAVING clause, whether their data types are all the same. If not, it lists all the 
examples and their origin (the routine) 
Even this little check can occasionally be scarily revealing.  
;WITH userParameter AS  ( SELECT   c.NAME AS ParameterName,  OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName,  t.name + ' '     + CASE     --we may have to put in the length            WHEN t.name IN ('char', 'varchar', 'nchar', 'nvarchar')             THEN '('               + CASE WHEN c.max_length = -1 THEN 'MAX'                ELSE CONVERT(VARCHAR(4),                    CASE WHEN t.name IN ('nchar', 'nvarchar')                      THEN c.max_length / 2 ELSE c.max_length                    END)                END + ')'         WHEN t.name IN ('decimal', 'numeric')             THEN '(' + CONVERT(VARCHAR(4), c.precision)                   + ',' + CONVERT(VARCHAR(4), c.Scale) + ')'         ELSE ''      END  --we've done with putting in the length      + CASE WHEN XML_collection_ID <> 0         THEN --deal with object schema names             '(' + CASE WHEN is_XML_Document = 1                    THEN 'DOCUMENT '                    ELSE 'CONTENT '                   END              + COALESCE(               (SELECT QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name)                FROM sys.xml_schema_collections sc                INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID                WHERE sc.xml_collection_ID = c.XML_collection_ID),'NULL') + ')'          ELSE ''         END        AS [DataType]  FROM sys.parameters c  INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID  WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys'   AND parameter_id>0)SELECT CONVERT(CHAR(80),objectName+'.'+ParameterName),DataType FROM UserParameterWHERE ParameterName IN   (SELECT ParameterName FROM UserParameter    GROUP BY ParameterName    HAVING MIN(Datatype)<>MAX(DataType))ORDER BY ParameterName  
so, in a very small example here, we have a @ClosingDelimiter variable that is only CHAR(1) when, by the looks of it, it should be up to ten characters long, or even worse, a function that should be a char(1) and seems  to let in a string of ten characters. Worth investigating. Then we have a 
@Comment variable that can't decide whether it is a VARCHAR(2000) or a VARCHAR(MAX)
2/ Columns and Parameters
Actually, once we’ve cleared up the mess we’ve made of our parameter-naming in 
the database we’re inspecting, we’re going to be more interested in listing both columns and parameters. We can do this 
by modifying the routine to list columns as well as parameters. Because of the slight complexity of creating the string 
version of the datatypes, we will create a fake table of both columns and parameters so that they can both be processed 
the same way. After all, we want the datatypes to match 
Unfortunately, parameters do not expose all the attributes we are interested 
in, such as whether they are nullable (oh yes, subtle bugs happen if this isn’t consistent for a datatype). We’ll have 
to leave them out for this check. 
Voila! A slight modification of the first routine 
;WITH userObject AS  ( SELECT   Name AS DataName,--the actual name of the parameter or column ('@' removed)  --and the qualified object name of the routine  OBJECT_SCHEMA_NAME(ObjectID) + '.' + OBJECT_NAME(ObjectID) AS ObjectName,  --now the harder bit: the definition of the datatype.  TypeName + ' '     + CASE     --we may have to put in the length. e.g. CHAR (10)           WHEN TypeName IN ('char', 'varchar', 'nchar', 'nvarchar')             THEN '('               + CASE WHEN MaxLength = -1 THEN 'MAX'                ELSE CONVERT(VARCHAR(4),                    CASE WHEN TypeName IN ('nchar', 'nvarchar')                      THEN MaxLength / 2 ELSE MaxLength                    END)                END + ')'         WHEN TypeName IN ('decimal', 'numeric')--a BCD number!             THEN '(' + CONVERT(VARCHAR(4), Precision)                   + ',' + CONVERT(VARCHAR(4), Scale) + ')'         ELSE ''      END  --we've done with putting in the length      + CASE WHEN XML_collection_ID <> 0 --tush tush. XML         THEN --deal with object schema names             '(' + CASE WHEN is_XML_Document = 1                    THEN 'DOCUMENT '                    ELSE 'CONTENT '                   END              + COALESCE(               (SELECT TOP 1 QUOTENAME(ss.name) + '.' + QUOTENAME(sc.Name)                FROM sys.xml_schema_collections sc                INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID                WHERE sc.xml_collection_ID = XML_collection_ID),'NULL') + ')'          ELSE ''         END        AS [DataType],       DataObjectType  FROM   (Select t.name AS TypeName, REPLACE(c.name,'@','') AS Name,          c.max_length AS MaxLength, c.precision AS [Precision],           c.scale AS [Scale], c.[Object_id] AS ObjectID, XML_collection_ID,          is_XML_Document,'P' AS DataobjectType  FROM sys.parameters c  INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID  AND parameter_id>0  UNION all  Select t.name AS TypeName, c.name AS Name, c.max_length AS MaxLength,          c.precision AS [Precision], c.scale AS [Scale],          c.[Object_id] AS ObjectID, XML_collection_ID,is_XML_Document,          'C' AS DataobjectType            FROM sys.columns c  INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID   WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys'  )f)SELECT CONVERT(CHAR(80),objectName+'.'   + CASE WHEN DataobjectType ='P' THEN '@' ELSE '' END + DataName),DataType FROM UserObjectWHERE DataName IN   (SELECT DataName FROM UserObject   GROUP BY DataName    HAVING MIN(Datatype)<>MAX(DataType))ORDER BY DataName     Hmm. I can tell you I found quite a few minor issues with the various 
tabases I tested this on, and found some potential bugs that really leap out at you from the results. Here is the 
start of the result for AdventureWorks. Yes, AccountNumber is, for some reason, a Varchar(10) in the Customer table. 
Hmm. odd. Why is a city fifty characters long in that view?  The idea of the description of a colour being 256 
characters long seems over-ambitious. Go down the list and you'll spot other mistakes. There are no bugs, but just mess.
We started out with a listing to examine parameters, then we mixed parameters 
and columns. Our last listing is for a slightly more in-depth look at table columns. You’ll notice that we’ve 
delibarately removed the indication of whether a column is persisted, or is an identity column because that gives us 
false positives for our code smells. If you just want to browse your metadata for other reasons (and it can quite help 
in some circumstances) then uncomment them! 
;WITH userColumns AS  ( SELECT   c.NAME AS columnName,  OBJECT_SCHEMA_NAME(c.object_ID) + '.' + OBJECT_NAME(c.object_ID) AS ObjectName,  REPLACE(t.name + ' '   + CASE WHEN is_computed = 1 THEN ' AS ' + --do DDL for a computed column          (SELECT definition FROM sys.computed_columns cc           WHERE cc.object_id = c.object_id AND cc.column_ID = c.column_ID)     --we may have to put in the length            WHEN t.Name IN ('char', 'varchar', 'nchar', 'nvarchar')             THEN '('               + CASE WHEN c.Max_Length = -1 THEN 'MAX'                ELSE CONVERT(VARCHAR(4),                    CASE WHEN t.Name IN ('nchar', 'nvarchar')                      THEN c.Max_Length / 2 ELSE c.Max_Length                    END)                END + ')'       WHEN t.name IN ('decimal', 'numeric')       THEN '(' + CONVERT(VARCHAR(4), c.precision) + ',' + CONVERT(VARCHAR(4), c.Scale) + ')'       ELSE ''      END + CASE WHEN c.is_rowguidcol = 1          THEN ' ROWGUIDCOL'          ELSE ''         END + CASE WHEN XML_collection_ID <> 0            THEN --deal with object schema names             '(' + CASE WHEN is_XML_Document = 1                THEN 'DOCUMENT '                ELSE 'CONTENT '               END + COALESCE((SELECT                QUOTENAME(ss.name) + '.' + QUOTENAME(sc.name)                FROM                sys.xml_schema_collections sc                INNER JOIN Sys.Schemas ss ON sc.schema_ID = ss.schema_ID                WHERE                sc.xml_collection_ID = c.XML_collection_ID),                'NULL') + ')'            ELSE ''           END + CASE WHEN is_identity = 1             THEN CASE WHEN OBJECTPROPERTY(object_id,                'IsUserTable') = 1 AND COLUMNPROPERTY(object_id,                c.name,                'IsIDNotForRepl') = 0 AND OBJECTPROPERTY(object_id,                'IsMSShipped') = 0                THEN ''                ELSE ' NOT FOR REPLICATION '               END             ELSE ''            END + CASE WHEN c.is_nullable = 0               THEN ' NOT NULL'               ELSE ' NULL'              END + CASE                WHEN c.default_object_id <> 0                THEN ' DEFAULT ' + object_Definition(c.default_object_id)                ELSE ''               END + CASE                WHEN c.collation_name IS NULL                THEN ''                WHEN c.collation_name <> (SELECT                collation_name                FROM                sys.databases                WHERE                name = DB_NAME()) COLLATE Latin1_General_CI_AS                THEN COALESCE(' COLLATE ' + c.collation_name,                '')                ELSE ''                END,'  ',' ') AS [DataType]FROM sys.columns c  INNER JOIN sys.types t ON c.user_Type_ID = t.user_Type_ID  WHERE OBJECT_SCHEMA_NAME(c.object_ID) <> 'sys')SELECT CONVERT(CHAR(80),objectName+'.'+columnName),DataType FROM UserColumnsWHERE columnName IN (SELECT columnName FROM UserColumns 
 GROUP BY columnName 
 HAVING MIN(Datatype)<>MAX(DataType))ORDER BY columnName If you take a look down the results against Adventureworks, you'll see once again that there are things to 
investigate, mostly, in the illustration, discrepancies between null and non-null datatypes
So I here you ask, what about temporary variables 
within routines? If ever there was a source of elusive bugs, you'll find it there. Sadly, these temporary variables are 
not stored in the metadata so we'll have to find a more subtle way of flushing these out, and that will, I'm afraid, 
have to wait!