Parsing T-SQL – The easy way
- by Dave Ballantyne
Every once in a while, I hit an issue that would require me to interrogate/parse some T-SQL code.  Normally, I would shy away from this and attempt to solve the problem in some other way.  I have written parsers before in the the past using LEX and YACC, and as much fun and awesomeness that path is,  I couldnt justify the time it would take.  However, this week I have been faced with just such an issue and at the back of my mind I can remember reading through the SQLServer 2012 feature pack and seeing something called “Microsoft SQL Server 2012 Transact-SQL Language Service “.  This is described there as :  “The SQL Server Transact-SQL Language Service is a component based on the .NET Framework which provides parsing validation and IntelliSense services for Transact-SQL for SQL Server 2012, SQL Server 2008 R2, and SQL Server 2008. “  Sounds just what I was after.  Documentation is very scant on this so dont take what follows as best practice or best use, just a practice and a use.  Knowing what I was sort of looking for something, I found the relevant assembly in the gac which is the simply named ,’Microsoft.SqlServer.Management.SqlParser’.  Even knowing that you wont find much in terms of documentation if you do a web-search, but you will find the MSDN documentation that list the members and methods etc…  The “scanner”  class sounded the most appropriate for my needs as that is described as “Scans Transact-SQL searching for individual units of code or tokens.”.  After a bit of poking, around the code i ended up with was something like  [System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Management.SqlParser") | Out-Null
$ParseOptions = New-Object Microsoft.SqlServer.Management.SqlParser.Parser.ParseOptions
$ParseOptions.BatchSeparator = 'GO'
$Parser = new-object Microsoft.SqlServer.Management.SqlParser.Parser.Scanner($ParseOptions)
$Sql = "Create Procedure MyProc as Select top(10) * from dbo.Table"
$Parser.SetSource($Sql,0)
$Token=[Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::TOKEN_SET
$Start =0
$End = 0
$State =0 
$IsEndOfBatch = $false
$IsMatched = $false
$IsExecAutoParamHelp = $false
while(($Token = $Parser.GetNext([ref]$State ,[ref]$Start, [ref]$End, [ref]$IsMatched, [ref]$IsExecAutoParamHelp ))-ne [Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]::EOF) {
    try{
        ($TokenPrs =[Microsoft.SqlServer.Management.SqlParser.Parser.Tokens]$Token) | Out-Null
        $TokenPrs
        $Sql.Substring($Start,($end-$Start)+1)
    }catch{
        $TokenPrs = $null
    }    
}
As you can see , the $Sql variable holds the sql to be parsed , that is pushed into the $Parser object using SetSource,  and then we will use GetNext until the EOF token is returned.  GetNext will also return the Start and End character positions within the source string of the parsed text.
This script’s output is :
TOKEN_CREATE
  Create
  TOKEN_PROCEDURE
  Procedure
  TOKEN_ID
  MyProc
  TOKEN_AS
  as
  TOKEN_SELECT
  Select
  TOKEN_TOP
  top
  TOKEN_INTEGER
  10
  TOKEN_FROM
  from
  TOKEN_ID
  dbo
  TOKEN_TABLE
  Table
  
note that the ‘(‘, ‘)’  and ‘*’ characters have returned a token type that is not present in the Microsoft.SqlServer.Management.SqlParser.Parser.Tokens Enum that has caused an error which has been caught in the catch block.  
Fun, Fun ,Fun , Simple T-SQL Parsing.  Hope this helps someone in the same position,  let me know how you get on.