Performance problems loading XML with SSIS, an alternative way!

Posted by AtulThakor on SQL Blogcasts See other posts from SQL Blogcasts or by AtulThakor
Published on Wed, 02 Feb 2011 00:19:00 GMT Indexed on 2011/02/02 15:29 UTC
Read the original article Hit count: 430

Filed under:
|
|
|
|

I recently needed to load several thousand XML files into a SQL database, I created an SSIS package which was created as followed:

Using a foreach container to loop through a directory and load each file path into a variable, the “Import XML” dataflow would then load each XML file into a SQL table.

 

 

image

 

Running this, it took approximately 1 second to load each file which seemed a massive amount of time to parse the XML and load the data, speaking to my colleague Martin Croft, he suggested the use of T-SQL Bulk Insert and OpenRowset, so we adjusted the package as followed:

 

image

 

The same foreach container was used but instead the following SQL command was executed (this is an expression):

 

 

"INSERT INTO MyTable(FileDate)
SELECT   CAST(bulkcolumn AS XML)
    FROM OPENROWSET(
        BULK
        '" + @[User::CurrentFile]  + "',
        SINGLE_BLOB ) AS x"

 

 

Using this method we managed to load approximately 20 records per second, much faster…for data loading!

For what we wanted to achieve this was perfect but I’ll leave you with the following points when making your own decision on which solution you decide to choose! 

 

 

Openrowset Method

  • Much faster to get the data into SQL
  • You’ll need to parse or create a view over the XML data to allow the data to be more usable(another post on this!)
  • Not able to apply validation/transformation against the data when loading it
  • The SQL Server service account will need permission to the file
  • No schema validation when loading files

SSIS

  • Slower (in our case)
  • Schema validation
  • Allows you to apply transformations/joins to the data
  • Permissions should be less of a problem
  • Data can be loaded into the final form through the package
  • When using a schema validation errors can fail the package (I’ll do another post on this)

© SQL Blogcasts or respective owner

Related posts about ssis

Related posts about Performance