Use the power of regular expressions to cleanse your data right there inside the Data Flow. This transformation includes a full user interface for simple configuration, as well as advanced features such as error output configuration. 
Two regular expressions are used, a match 
expression and a replace 
expression. The transformation is designed around the named capture groups or match groups, and even supports multiple expressions. This allows for rich and complex expressions to be built, all through an easy to reuse transformation where a bespoke Script Component was previously the only alternative. 
Some simple properties are available for each column selected – 
Behaviour
The two behaviour modes offer similar functionality but with a difference. Replace, replaces tokens with the input, and Emit overwrites the whole string. 
Cascade
Cascade allows you to define multiple expressions, each on a new line. The match 
expression will be processed into one operation per line, which are then processed in order at run-time. Multiple replace expressions can also be specified, again each on a new line. If there is no corresponding replace 
expression for a match 
expression line, then the last replace 
expression will be used instead. It is common to have multiple match expressions, but only a single replace 
expression. 
Match 
Expression
The 
expression used to define the named capture groups. This is where you can analyse the data, and tag or name elements within it as found by the match 
expression. 
Replace 
Expression
The replace determines the final output. It will reference the named groups from the match 
expression and assembles them into the final output. 
If you want to use regular expressions to validate data then try the Regular 
Expression Transformation. 
Quick Start Guide
Select a column. A new output column is created for each selected column; there is no option for in-place replacement of column values. One input column can be used to populate multiple output columns, just select the column again in the lower grid, using the Input Columns drop-down selector. 
Amend the output column name and size as required. They default to the same as the input column selected. 
Amend the behaviour as required, the default is Replace. 
Amend the cascade option as required, the default is true. 
Finally enter your match and replace regular expressions 
Quick Sample #1
Parse an email address and extract the user and domain portions. Format as a web address passing the user portion as a URL parameter. This uses two match groups, user and host, which correspond to the text before the @ and after it respectively. 
Behaviour is Emit, and cascade of false, we only have a single match 
expression. 
Match 
Expression ^(?<user>[^@]+)@(?<host>.+)$ 
Replace 
Expression - http://www.${host}?user=${user} 
Results 
	
		
			Sample Input
			Sample Output
		
		
			
[email protected]
			http://www.adventure-works.com?user=zheng0
		
	
The component is provided as an MSI file, however to complete the installation, you will have to add the transformation to the Visual Studio toolbox manually. Right-click the toolbox, and select Choose Items.... Select the SSIS Data Flow Items tab, and then check the RegexClean Transformation from the list. 
Downloads
The RegexClean Transformation is available for both SQL Server 2005 and SQL Server 2008. Please choose the version to match your SQL Server version, or you can install both versions and use them side by side if you have both SQL Server 2005 and SQL Server 2008 installed. 
RegexClean Transformation for SQL Server 2005 
RegexClean Transformation for SQL Server 2008 
Version History
SQL Server 2005
Version 1.0.0.105 - Public Release 
(28 Jan 2008) 
SQL Server 2005
Version 1.0.0.105 - Public Release 
(28 Jan 2008) 
Screenshot