How to archive data from a table to a local or remote database in SQL 2005 and SQL 2008
- by simonsabin
Often you have the need to archive data from a 
table.
This leads to a number of challenges
1. How can you do it without impacting users
2. How can I make it transactionally consistent, i.e. the data I put in the 
archive is the data I remove from the main table
3. How can I get it to perform well
Points 1 is very much tied to point 3. If it doesn't perform well then 
the delete of data is going to cause lots of locks and thus potentially 
blocking.
For points 1 and 3 refer to my previous posts DELETE-TOP-x-rows-avoiding-a-table-scan and 
UPDATE-and-DELETE-TOP-and-ORDER-BY---Part2. 
In essence you need to be removing small chunks of data from your table and you 
want to do that avoiding a table scan.
So that deals with the delete approach but archiving is about inserting that 
data somewhere else.
Well in SQL 2008 they introduced a new feature INSERT over DML (Data 
Manipulation Language, i.e. SQL statements that change data), or composable DML. 
The ability to nest DML statements within themselves, so you can past the 
results of an insert to an update to a merge. I've mentioned this before here SQL-Server-2008---MERGE-and-optimistic-concurrency. 
This feature is currently limited to being able to consume the results of a DML 
statement in an INSERT statement. There are many restrictions which you can find 
here http://msdn.microsoft.com/en-us/library/ms177564.aspx look 
for the section "Inserting Data Returned From an OUTPUT Clause Into a Table"
Even with the restrictions what we can do is consume the OUTPUT from a DELETE 
and INSERT the results into a table in another database. Note that in BOL it 
refers to not being able to use a remote table, remote means a table on another 
SQL instance.
To show this working use this SQL to setup two databases foo and 
fooArchive
create database foo
go
--create 
the source table fred in database foo
select * into foo..fred from sys.objects
go
create database fooArchive
go
if object_id('fredarchive',DB_ID('fooArchive')) is null
begin
    select getdate() 
ArchiveDate,* into fooArchive..FredArchive from 
sys.objects where 
1=2
 
    end
go
And then we can use this simple statement to archive the data
insert into fooArchive..FredArchive 
select getdate(),d.*
from 
(delete top (1) 
        
from foo..Fred
        
output deleted.*) d
        
go
In this statement the delete can be any delete statement you wish so if you 
are deleting by ids or a range of values then you can do that. Refer to the DELETE-TOP-x-rows-avoiding-a-table-scan post 
to ensure that your delete is going to perform. The last thing you want to do is 
to perform 100 deletes each with 5000 records for each of those deletes to do a 
table scan.
For a solution that works for SQL2005 or if you want to archive to a 
different server then you can use linked servers or SSIS. This example 
shows how to do it with linked servers. [ONARC-LAP03] is the source server.
begin transaction
insert into fooArchive..FredArchive 
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1) 
             
       from 
foo..Fred
                    
output deleted.*') 
d
commit 
transaction
and to prove the 
transactions work try, you should get the same number of records before and 
after.
select 
(select count(1) from foo..Fred) 
fred
       ,(select COUNT(1) from 
fooArchive..FredArchive ) fredarchive
 
begin transaction
insert into fooArchive..FredArchive 
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1) 
                    
from foo..Fred
                    
output deleted.*') 
d
rollback transaction
 
select 
(select count(1) from foo..Fred) 
fred
       ,(select COUNT(1) from 
fooArchive..FredArchive ) fredarchive
The transactions are very important with this solution. Look what happens 
when you don't have transactions and an error occurs
 
select 
(select count(1) from foo..Fred) 
fred
       ,(select COUNT(1) from 
fooArchive..FredArchive ) fredarchive
 
insert into fooArchive..FredArchive 
select getdate(),d.*
from openquery ([ONARC-LAP03],'delete top (1) 
                    
from foo..Fred
                    
output deleted.*
                    
raiserror (''Oh doo doo'',15,15)') 
d
                    
select 
(select count(1) from foo..Fred) 
fred
       ,(select COUNT(1) from 
fooArchive..FredArchive ) fredarchive
Before running this think what the result would be. I got it wrong. 
What seems to happen is that the remote query is executed as a transaction, 
the error causes that to rollback. However the results have already been sent to 
the client and so get inserted into the