Friday, May 20, 2011

Executing Heavy Set-Based Operations Against VLDBs in SQL Server

Recently I had to design and execute some heavy set-based T-SQL operations against a large number of very large databases (VLDBs) within a SQL Server 2005 instance.

I won't enter the debate of Row-Based vs. Set-Based processing right now, as I plan to write an article on this in the near future, but I will rather discuss some concerns that need to be taken into account when performing such operations.

First of all, in terms of hardware resources you will need enough RAM and of course, more than one processor (remember to check the Processor affinity setting in your SQL Server instance).

Now, the most important aspect when executing such operations (set-based) against large databases is the log space.

As the execution of the operation continues, a significant amount of transaction log space is being reserved thus requiring more storage. The reason this is happening is that the operation is not committed until it completes its execution against the entire data set. So, if your target database's recovery model is set to "FULL" it means that any changes performed by the set-based operation will be logged up to the latest detail thus requiring more storage.

So, how can you ensure that the set-based operation will be completed without any problems such as full disk spaces, etc. and thus avoid the possibility of having the operation abnormally terminated?

Well, there are two approaches (if not more); the easy one and the complex one.

The complex approach is to break-up the set-based operation thus targeting smaller data sets within the database. Not so good right? I don't like it either! In my opinion, this should be the very last option as “segmenting” the target data set is not a trivial task, mainly in terms of data consistency.

And here’s the simple approach:

1. Find a date and time where there are not operations targeting the database (this is actually a downtime :)

2. Set the database's recovery model to SIMPLE as in the following example:

ALTER DATABASE [Database_Name] SET RECOVERY SIMPLE
GO

3. Execute the set-based operation.

4. Set the database's recovery model to FULL as in the following example:

ALTER DATABASE [Database_Name] SET RECOVERY FULL
GO

When your database uses the Full Recovery Model, SQL Server preserves the transaction log until you back it up. By doing this, it allows you to recover the database to a specific point of time based on the entries in the transaction log. That’s why when this recovery model is selected SQL Server logs everything in the database’s log file. Of course this requires more disk space and slightly affects performance but it enables you to fully recover your database in the case of a disaster.

When you use the Simple Recovery Model, SQL Server keeps only a minimal amount of information in the log file. Also the file is automatically truncated by SQL Server whenever the database reaches a transaction checkpoint. Even though this is not the best choice for disaster recovery purposes, it allows the processing on the database to be faster.

So, as I had to execute some really heavy set-based T-SQL operations I changed the recovery model to Simple in order for the processing to be faster and to ensure (up to the possible limit) that I wouldn’t experience any disk space issues due to the size increase of the transaction log files.

Hope this helps!
[Ads]
Check out my latest eBooks on SQL Server:
Administering SQL Server - Ebook Tuning SQL Server - Ebook

Easily secure your SQL Server instances! Try out DBA Security Advisor!
DBA Security Advisor for SQL Server by SQLArtBits
DBA Security Advisor was developed by SQLArtBits. The tool was being carefully developed for an entire year and it contains comprehensive security checks and recommendations based on proven SQL Server security best practices. Feel free to download and use the Community Edition which is free and if you find it useful, you can consider upgrading to the Enterprise Edition!

Artemakis Artemiou [MVP]

Author & Editor

Artemakis Artemiou is a Senior SQL Server Architect, Software Developer and Microsoft Data Platform MVP. He is also an author, regular blogger, president of Cyprus.NET User Group and CY Country leader of INETA-EU. He is also the creator of DBA Security Advisor and In-Memory OLTP Simulator. Artemakis is a frequent guest author of worldwide well-respected online journals where he writes articles focusing on many SQL Server topics.

Reference: The SQL Server and .NET Hub (http://www.sqlnethub.com)

3 comments:

Mega said...

Well said,I agree with you.

Jack said...

What exactly do you mean when you say that Simple Recovery "keeps only a minimal amount of information in the log file"? Actions are logged just like Bulk_Logged, http://technet.microsoft.com/en-us/magazine/2009.02.logging.aspx#id0060077

Artemakis Artemiou [MVP] said...

Hi Jack,

Simple and Bulk-Logged Recovery models indeed both allow high-performance bulk copy operations.

However they have some differences.

The Bulk-Logged Recovery Model minimally logs bulk operations. However it fully logs all other transactions.

The Simple Recovery Model does not backup the transaction log at all. It actually logs only minimal data related to the user that performs operations against the database.

Also the Simple Recovery Model requires the least administration.

The reason I used the Simple Recovery Model for this post's example, is that I just
wanted to fully avoid the transaction log backup in order for executing the heavy set-based operations and switching back to the Full Recovery Model.

Now, when it comes to selecting the recovery model for a proper backup strategy on a Production Environment, you first need to
analyze the business requirements that govern the target database(s) in order to make the right choice.

Cheers,
Artemis