Guide to reduce TFS database growth using the Test Attachment Cleaner

Posted by terje on Geeks with Blogs See other posts from Geeks with Blogs or by terje
Published on Tue, 15 Nov 2011 13:35:03 GMT Indexed on 2011/11/15 17:52 UTC
Read the original article Hit count: 1156

Filed under:

Recently there has been several reports on TFS databases growing too fast and growing too big.  Notable this has been observed when one has started to use more features of the Testing system.  Also, the TFS 2010 handles test results differently from TFS 2008, and this leads to more data stored in the TFS databases. As a consequence of this there has been released some tools to remove unneeded data in the database, and also some fixes to correct for bugs which has been found and corrected during this process.  Further some preventive practices and maintenance rules should be adopted.

A lot of people have blogged about this, among these are:

Anu’s very important blog post here describes both the problem and solutions to handle it.  She describes both the Test Attachment Cleaner tool, and also some QFE/CU releases to fix some underlying bugs which prevented the tool from being fully effective.

Brian Harry’s blog post here describes the problem too

This forum thread describes the problem with some solution hints.

Ravi Shanker’s blog post here describes best practices on solving this (TBP)

Grant Holidays blogpost here describes strategies to use the Test Attachment Cleaner both to detect space problems and how to rectify them.

 

The problem can be divided into the following areas:

  • Publishing of test results from builds
  • Publishing of manual test results and their attachments in particular
  • Publishing of deployment binaries for use during a test run
  • Bugs in SQL server preventing total cleanup of data

(All the published data above is published into the TFS database as attachments.)

The test results will include all data being collected during the run.  Some of this data can grow rather large, like IntelliTrace logs and video recordings.   Also the pushing of binaries which happen for automated test runs, including tests run during a build using code coverage which will include all the files in the deployment folder, contributes a lot to the size of the attached data.

 

In order to handle this systematically, I have set up a 3-stage process:

  1. Find out if you have a database space issue
  2. Set up your TFS server to minimize potential database issues
  3. If you have the “problem”, clean up the database and otherwise keep it clean

 

Analyze the data

Are your database( s) growing ?  Are unused test results growing out of proportion ?

To find out about this you need to query your TFS database for some of the information, and use the Test Attachment Cleaner (TAC) to obtain some  more detailed information.

If you don’t have too many databases you can use the SQL Server reports from within the Management Studio to analyze the database and table sizes. Or, you can use a set of queries . I find queries often faster to use because I can tweak them the way I want them.  But be aware that these queries are non-documented and non-supported and may change when the product team wants to change them.

If you have multiple Project Collections, find out which might have problems:

(Disclaimer: The queries below work on TFS 2010. They will not work on Dev-11, since the table structure have been changed.  I will try to update them for Dev-11 when it is released.)

Open a SQL Management Studio session onto the SQL Server where you have your TFS Databases.

Use the query below to find the Project Collection databases and their sizes, in descending size order. 

 
use master
select DB_NAME(database_id) AS DBName, (size/128) SizeInMB
 FROM sys.master_files 
 where type=0  and substring(db_name(database_id),1,4)='Tfs_' and DB_NAME(database_id)<>'Tfs_Configuration' order by size desc 

Doing this on one of our SQL servers gives the following results:

image

It is pretty easy to see on which collection to start the work Smile

 

Find out which tables are possibly too large

Keep a special watch out for the Tfs_Attachment table.

Use the script at the bottom of Grant’s blog to find the table sizes in descending size order.

In our case we got this result:

image

From Grant’s blog we learnt that the tbl_Content is in the Version Control category, so the major only big issue we have here is the tbl_AttachmentContent.

 

Find out which team projects have possibly too large attachments

In order to use the TAC to find and eventually delete attachment data we need to find out which team projects have these attachments. The team project is a required parameter to the TAC.

Use the following query to find this, replace the collection database name with whatever applies in your case:

 

use Tfs_DefaultCollection
select  p.projectname, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a 
inner join tbl_testrun as tr on a.testrunid=tr.testrunid 
inner join tbl_project as p on p.projectid=tr.projectid
group by p.projectname
order by sum(a.compressedlength) desc

In our case we got this result (had to remove some names), out of more than 100 team projects accumulated over quite some years:

image

As can be seen here it is pretty obvious the “Byggtjeneste – Projects” are the main team project to take care of, with the ones on lines 2-4 as the next ones. 

Check which attachment types takes up the most space

It can be nice to know which attachment types takes up the space, so run the following query:

use Tfs_DefaultCollection
select  a.attachmenttype, sum(a.compressedlength)/1024/1024 as sizeInMB from dbo.tbl_Attachment as a 
inner join tbl_testrun as tr on a.testrunid=tr.testrunid 
inner join tbl_project as p on p.projectid=tr.projectid
group by a.attachmenttype
order by sum(a.compressedlength) desc

We then got this result:

image

From this it is pretty obvious that the problem here is the binary files, as also mentioned in Anu’s blog.

Check which file types, by their extension, takes up the most space

Run the following query

use Tfs_DefaultCollection
select  SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999)as Extension, sum(compressedlength)/1024 as SizeInKB from tbl_Attachment
group by SUBSTRING(filename,len(filename)-CHARINDEX('.',REVERSE(filename))+2,999)
order by sum(compressedlength) desc

This gives a result like this:

image

 

Now you should have collected enough information to tell you what to do – if you got to do something, and some of the information you need in order to set up your TAC settings file, both for a cleanup and for scheduled maintenance later. 

 

Get your TFS server and environment properly set up

Even if you have got the problem or if have yet not got the problem, you should ensure the TFS server is set up so that the risk of getting into this problem is minimized.  To ensure this you should install the following set of updates and components.

The assumption is that your TFS Server is at SP1 level.

  • Install the QFE for KB2608743 – which also contains detailed instructions on its use, download from here. The QFE changes the default settings to not upload deployed binaries, which are used in automated test runs.
    • Binaries will still be uploaded if:
      • Code coverage is enabled in the test settings.
      • You change the UploadDeploymentItem to true in the testsettings file.
        • Be aware that this might be reset back to false by another user which haven't installed this QFE.
    • The hotfix should be installed to
      • The build servers (the build agents)
      • The machine hosting the Test Controller
      • Local development computers (Visual Studio)
      • Local test computers (MTM)
    • It is not required to install it to the TFS Server, test agents or the build controller – it has no effect on these programs.
  • If you use the SQL Server 2008 R2 you should also install the CU 10 (or later).  This CU fixes a potential problem of hanging “ghost” files.  This seems to happen only in certain trigger situations, but to ensure it doesn’t bite you, it is better to make sure this CU is installed.
    • There is no such CU for SQL Server 2008 pre-R2
      • Work around:  If you suspect hanging ghost files, they can be – with some mental effort, deduced from the ghost counters using the following SQL query:
        use master
        SELECT DB_NAME(database_id) as 'database',OBJECT_NAME(object_id) as 'objectname',
        index_type_desc,ghost_record_count,version_ghost_record_count,record_count,avg_record_size_in_bytes 
        FROM sys.dm_db_index_physical_stats (DB_ID(N'<DatabaseName>'), OBJECT_ID(N'<TableName>'), NULL, NULL , 'DETAILED')
      • The problem is a stalled ghost cleanup process.  Restarting the SQL server after having stopped all components that depends on it, like the TFS Server and SPS services – that is all applications that connect to the SQL server. Then restart the SQL server, and finally start up all dependent processes again.  (I would guess a complete server reboot would do the trick too.) After this the ghost cleanup process will run properly again.
    • The fix will come in the next CU cycle for SQL Server R2 SP1.  The R2 pre-SP1 and R2 SP1 have separate maintenance cycles, and are maintained individually. Each have its own set of CU’s. When it comes I will add the link here to that CU.
    • The "hanging ghost file” issue came up after one have run the TAC, and deleted enourmes amount of data.  The SQL Server can get into this hanging state (without the QFE) in certain cases due to this.
  • And of course, install and set up the Test Attachment Cleaner command line power tool.  This should be done following some guidelines from Ravi Shanker:
    • When you run TAC, ensure that you are deleting small chunks of data at regular intervals (say run TAC every night at 3AM to delete data that is between age 730 to 731 days) – this will ensure that small amounts of data are being deleted and SQL ghosted record cleanup can catch up with the number of deletes performed. “
      • This rule minimizes the risk of the ghosted hang problem to occur, and further makes it easier for the SQL server ghosting process to work smoothly.
    • “Run DBCC SHRINKDB post the ghosted records are cleaned up to physically reclaim the space on the file system”
      • This is the last step in a 3 step process of removing SQL server data. First they are logically deleted. Then they are cleaned out by the ghosting process, and finally removed using the shrinkdb command.

Cleaning out the attachments

The TAC is run from the command line using a set of parameters and controlled by a settingsfile.  The parameters point out a server uri including the team project collection and also point at a specific team project. So in order to run this for multiple team projects regularly one has to set up a script to run the TAC multiple times, once for each team project.  When you install the TAC there is a very useful readme file in the same directory.

When the deployment binaries are published to the TFS server, ALL items are published up from the deployment folder. That often means much more files than you would assume are necessary. This is a brute force technique. It works, but you need to take care when cleaning up.

Grant has shown how their settings file looks in his blog post, removing all attachments older than 180 days , as long as there are no active workitems connected to them. This setting can be useful to clean out all items, both in a clean-up once operation, and in a general

There are two scenarios we need to consider:

  1. Cleaning up an existing overgrown database
  2. Maintaining a server to avoid an overgrown database using scheduled TAC

 

1. Cleaning up a database which has grown too big due to these attachments.

This job is a “Once” job.  We do this once and then move on to make sure it won’t happen again, by taking the actions in 2) below.  In this scenario you should only consider the large files. Your goal should be to simply reduce the size, and don’t bother about  the smaller stuff. That can be left a scheduled TAC cleanup ( 2 below).

Here you can use a very general settings file, and just remove the large attachments, or you can choose to remove any old items.  Grant’s settings file is an example of the last one.  A settings file to remove only large attachments could look like this:

<!-- Scenario : Remove large files -->
<DeletionCriteria>
    <TestRun />
    <Attachment>
        <SizeInMB GreaterThan="10" />
    </Attachment>
</DeletionCriteria>

Or like this:

If you want only to remove dll’s and pdb’s about that size, add an Extensions-section.  Without that section, all extensions will be deleted.

<!-- Scenario : Remove large files of type dll's and pdb's -->
<DeletionCriteria>
    <TestRun />
    <Attachment>
        <SizeInMB GreaterThan="10" />
        <Extensions>
            <Include value="dll" />
            <Include value="pdb" />
        </Extensions>
    </Attachment>
</DeletionCriteria>

Before you start up your scheduled maintenance, you should clear out all older items.

2. Scheduled maintenance using the TAC

If you run a schedule every night, and remove old items, and also remove them in small batches.  It is important to run this often, like every night, in order to keep the number of deleted items low. That way the SQL ghost process works better.

One approach could be to delete all items older than some number of days, let’s say 180 days. This could be combined with restricting it to keep attachments with active or resolved bugs.  Doing this every night ensures that only small amounts of data is deleted.

<!-- Scenario : Remove old items except if they have active or resolved bugs -->
<DeletionCriteria> 
  <TestRun> 
    <AgeInDays OlderThan="180" /> 
  </TestRun> 
  <Attachment /> 
  <LinkedBugs>     
     <Exclude state="Active" /> 
     <Exclude state="Resolved"/>
  </LinkedBugs> 
</DeletionCriteria>

In my experience there are projects which are left with active or resolved workitems, akthough no further work is done.  It can be wise to have a cleanup process with no restrictions on linked bugs at all. Note that you then have to remove the whole LinkedBugs section.

A approach which could work better here is to do a two step approach, use the schedule above to with no LinkedBugs as a sweeper cleaning task taking away all data older than you could care about.  Then have another scheduled TAC task to take out more specifically attachments that you are not likely to use. This task could be much more specific, and based on your analysis clean out what you know is troublesome data.

<!-- Scenario : Remove specific files early -->
<DeletionCriteria>
    <TestRun >
        <AgeInDays OlderThan="30" />
    </TestRun>
    <Attachment>
        <SizeInMB GreaterThan="10" />
        <Extensions>
            <Include value="iTrace"/>
            <Include value="dll"/>
            <Include value="pdb"/>
            <Include value="wmv"/>
        </Extensions>
    </Attachment>
    <LinkedBugs>
        <Exclude state="Active" />
        <Exclude state="Resolved" />
    </LinkedBugs>
</DeletionCriteria>

The readme document for the TAC says that it recognizes “internal” extensions, but it does recognize any extension.

To run the tool do the following command:

tcmpt attachmentcleanup /collection:your_tfs_collection_url /teamproject:your_team_project /settingsfile:path_to_settingsfile /outputfile:%temp%/teamproject.tcmpt.log /mode:delete

 

Shrinking the database

You could run a shrink database command after the TAC has run in cases where there are a lot of data being deleted.  In this case you SHOULD do it, to free up all that space.  But, after the shrink operation you should do a rebuild indexes, since the shrink operation will leave the database in a very fragmented state, which will reduce performance. Note that you need to rebuild indexes, reorganizing is not enough.

For smaller amounts of data you should NOT shrink the database, since the data will be reused by the SQL server when it need to add more records.  In fact, it is regarded as a bad practice to shrink the database regularly.  So on a daily maintenance schedule you should NOT shrink the database.

To shrink the database you do a DBCC SHRINKDATABASE command, and then follow up with a DBCC INDEXDEFRAG afterwards.  I find the easiest way to do this is to create a SQL Maintenance plan including the Shrink Database Task and the Rebuild Index Task and just execute it when you need to do this.

© Geeks with Blogs or respective owner