How to archive data from a table to a local or remote database in SQL 2005 and SQL 2008

Posted by simonsabin on SQL Blogcasts See other posts from SQL Blogcasts or by simonsabin
Published on Sat, 06 Mar 2010 09:12:44 GMT Indexed on 2010/03/11 17:20 UTC
Read the original article Hit count: 465

Filed under:

Often you have the need to archive data from a table.

This leads to a number of challenges

1. How can you do it without impacting users

2. How can I make it transactionally consistent, i.e. the data I put in the archive is the data I remove from the main table

3. How can I get it to perform well

Points 1 is very much tied to point 3. If it doesn't perform well then the delete of data is going to cause lots of locks and thus potentially blocking.

For points 1 and 3 refer to my previous posts DELETE-TOP-x-rows-avoiding-a-table-scan and UPDATE-and-DELETE-TOP-and-ORDER-BY---Part2. In essence you need to be removing small chunks of data from your table and you want to do that avoiding a table scan.

So that deals with the delete approach but archiving is about inserting that data somewhere else.

Well in SQL 2008 they introduced a new feature INSERT over DML (Data Manipulation Language, i.e. SQL statements that change data), or composable DML. The ability to nest DML statements within themselves, so you can past the results of an insert to an update to a merge. I've mentioned this before here SQL-Server-2008---MERGE-and-optimistic-concurrency. This feature is currently limited to being able to consume the results of a DML statement in an INSERT statement. There are many restrictions which you can find here http://msdn.microsoft.com/en-us/library/ms177564.aspx look for the section "Inserting Data Returned From an OUTPUT Clause Into a Table"

Even with the restrictions what we can do is consume the OUTPUT from a DELETE and INSERT the results into a table in another database. Note that in BOL it refers to not being able to use a remote table, remote means a table on another SQL instance.

To show this working use this SQL to setup two databases foo and fooArchive

create database foo

go

--create the source table fred in database foo

select * into foo..fred from sys.objects

go

create database fooArchive

go

if object_id('fredarchive',DB_ID('fooArchive')) is null

begin

    select getdate() ArchiveDate,* into fooArchive..FredArchive from sys.objects where 1=2

 

    end

go

And then we can use this simple statement to archive the data

insert into fooArchive..FredArchive

select getdate(),d.*

from (delete top (1)

        from foo..Fred

        output deleted.*) d

        go

In this statement the delete can be any delete statement you wish so if you are deleting by ids or a range of values then you can do that. Refer to the DELETE-TOP-x-rows-avoiding-a-table-scan post to ensure that your delete is going to perform. The last thing you want to do is to perform 100 deletes each with 5000 records for each of those deletes to do a table scan.

For a solution that works for SQL2005 or if you want to archive to a different server then you can use linked servers or SSIS. This example shows how to do it with linked servers. [ONARC-LAP03] is the source server.

begin transaction

insert into fooArchive..FredArchive

select getdate(),d.*

from openquery ([ONARC-LAP03],'delete top (1)

                    from foo..Fred

                    output deleted.*') d

commit transaction

and to prove the transactions work try, you should get the same number of records before and after.

select (select count(1) from foo..Fred) fred

       ,(select COUNT(1) from fooArchive..FredArchive ) fredarchive

 

begin transaction

insert into fooArchive..FredArchive

select getdate(),d.*

from openquery ([ONARC-LAP03],'delete top (1)

                    from foo..Fred

                    output deleted.*') d

rollback transaction

 

select (select count(1) from foo..Fred) fred

       ,(select COUNT(1) from fooArchive..FredArchive ) fredarchive

The transactions are very important with this solution. Look what happens when you don't have transactions and an error occurs

 

select (select count(1) from foo..Fred) fred

       ,(select COUNT(1) from fooArchive..FredArchive ) fredarchive

 

insert into fooArchive..FredArchive

select getdate(),d.*

from openquery ([ONARC-LAP03],'delete top (1)

                    from foo..Fred

                    output deleted.*

                    raiserror (''Oh doo doo'',15,15)') d

                   

select (select count(1) from foo..Fred) fred

       ,(select COUNT(1) from fooArchive..FredArchive ) fredarchive

Before running this think what the result would be. I got it wrong.

What seems to happen is that the remote query is executed as a transaction, the error causes that to rollback. However the results have already been sent to the client and so get inserted into the


© SQL Blogcasts or respective owner