Gradually Deleting Data in SQL Server

This week, I found a very interesting tip from Glenn Berry explaining how to deal with large delete operations. If you have a situation where you have a very large table in SQL Server, where you need to periodically delete millions of rows, there are several ways to do it.

If you have a maintenance window (or your database is not required to be available 24 x 7 x 365), you can  just delete all of the rows in one shot, using a set based operation. This would be the quickest way to delete a large number of rows, but you will probably end up getting lock escalation to a table lock, which essentially makes the table unavailable during the delete.

A safer, but much more time consuming way to delete millions of rows is to use some sort of looping mechanism, where you gradually delete a fairly small number of rows in a loop, to slowly nibble away at the table. This will take much longer than a set based operation, but, if done properly, will not cause concurrency problems.

Here is a short sample. In this case, we want to delete every row that has a LogId lower than a certain number. We are going to delete 500 random rows that qualify in each delete, and loop 1000 times, with a slight delay between each delete. You can obviously adjust these numbers and the delay time so that it works best in your environment.


SET NOCOUNT ON;

-- Check space used by table before we begin
EXEC sp_spaceused N'dbo.event_log';

-- Declare local variables
DECLARE @NumberOfLoops AS int;
SET @NumberOfLoops = 1000;

DECLARE @CurrentLoop AS int;
SET @CurrentLoop = 0

DECLARE @DeleteSize bigint;
SET @DeleteSize = 500;

DECLARE @MaxLogId bigint;
SET @MaxLogId = 12345678;

WHILE @CurrentLoop < @NumberOfLoops
    BEGIN
        -- Just delete any xxx rows that are below the @MaxLogId
        DELETE
        FROM dbo.event_log
        WHERE LogId IN
            (SELECT TOP(@DeleteSize) LogId
             FROM dbo.event_log WITH (NOLOCK)
             WHERE LogId < @MaxLogId );
            
        WAITFOR DELAY '00:00:00:50';
         
        SET @CurrentLoop = @CurrentLoop + 1;
    END

-- Check space used by table after we are done   
EXEC sp_spaceused N'dbo.event_log';



The WAITFOR just gives the system a slight break in between deleting batches of rows. It helps have less effect on concurrency. Please also note that table partitioning helps this scenario quite a lot, if it’s an option.

Commentaires