Rip's Domain

Other SQL optimization tips

Posted in Microsoft, MSSQL, SQL, TechSupport by rip747 on March 28, 2008

Ben started a whirl wind of ideas with his recent post about optimizing SQL queries for better performance. I was getting too carried away with commenting and decided that further tips that I had should be put down in my own post. Below are the ideas that I commented on, along with some new ones.

1) Create indexes on foreign key(s)

I think, no wait, I know that this is the most overlooked design aspect on almost every database. Far too often people forget to create an index for the foreign key(s) that are on a table. I guess the reason is because since all database servers automatically create an index on the columns that you specify to make up the primary key, why shouldn’t it do the same for foreign keys? Well it doesn’t and this is why it’s the most overlooked and biggest culprit of performance lost in a database.

2) Normalization

How many times have you seen a statement that joins 10 different tables all to grab a single value from each table? I’m so guilty of this, I should be thrown to the wolves. Again this is a HUGE performance hog and can be avoided by sitting down and taking the time to normalize your database. What is normalization you ask? It’s the idea that some data that is in one table can be copied into another table to prevent a join from occurring. I guess an example I can give is:

You have 2 tables named employees and companies. Companies has an one to many relation to Employees. When you write a query to retrieve an employee’s company name, you create a join between the employees table and the companies table and retrieve the company name.

It’s simple and we all do it and it’s probably not the greatest example, but I wanted something simple. Now if you wanted to normalize this, you would create a column on the employees table called company_name and copy the value from the companies table into that column, thus preventing you from have to create a join when retrieving the company name in a query which will improve the performance of the query. This can be accomplished at the application level or by using triggers within the database.
Now I wouldn’t use normalization in this situation, again I wanted something simple to explain the concept. With that said, when should you start and how do you determine what to normalize all depends on the data and the time the queries are taking to retrieve it. Most of time though I only normalize simple data that isn’t changed too often. For that stuff I use views.

3) Views and Indexed Views

Yes they’re different and indexed views aren’t supported on all database platforms so this doesn’t pertain to everyone.

Normalization is tough to deal with and maintain. So how do we go about increasing the performance of our database with copying columns and data everywhere, we use views!

Views are basically queries that you can save and then use like normal table when writing other queries. The idea is that instead of writing the same sub-queries or derived tables all the time and throughout your application, you can move those statements into views and then interact with them as you would with any other table.

Views are a great place to house the business logic of your application when you don’t want to or can’t use stored procedures. I use them a lot in applications to calculate expiration dates of memberships, totaling up line items in a shopping cart or just about any other calculations that the application needs.

Imagine trying to maintain a set of complicated calculations and business logic in an application that is copied over and over in queries scattered everywhere? You can’t!

Now I understand that most people argue that calculations like these should be moved into classes with the application, but I’ve found that that’s not always a smart thing to do and can sometimes cause bigger performance lost then realized. Imagine if you would, you have a method in your application that calculates the expiration date of a member and you need to access to this information at the database level. Well if the calculation is performed at the application level, that means you will need to copy and translate the logic to your database. By moving this calculation to a view you now have access to the information from both your database and the application and it’s all in one place! Also now since the calculation is being performed at the database level it makes you write less code in your application and not have to calculation for each record you return.

Ok great… so what is an indexed view? An index view is the basically the same thing as a regular view only you have the option of creating indexes on it where as a regular view you can’t. This can greatly increase the performance of queries that use views or span across multiple views. BE FOREWARNED though there are STRICT guidelines that you must follow in the creation and use of these views and they differ between database servers. You must consult your database servers documentation when attempting to use them.

Examples on using and creating indexed views in MSSQL 2000 can be found here.

4) Clustered Indexes

At the beginning of this article I mentioned that placing indexes on foreign key can be a big help and that almost all database server automatically create indexes for primary keys. Well expanding on that is the use of clustered indexes versus regular indexes.

When you create a regular old index on a table your database server basically creates a file on the server with information about where everything is within that table. Nothing happens to the data within the original table. Not so when you create a clustered index. A clustered index tells the database server how to physically write the data for this table onto the server’s disk. This is the reason why you can only have one clustered index on a table and it’s THE MOST important decision you can make when talking about performance.

Remember the advice at the beginning about placing index on foreign keys? Well to expand on that, you should also determine if your foreign key should be used as or be part of the clustered index on the table.

Let’s look back at the scenario I gave between the employees and the companies. After we’ve written our application, we noticed that throughout our application, there are many times when we query to list or find employees that are part of a particular company. Further investigation reveals that the only time we’re ever querying the employees table directly is when We’re authenticating a login or retrieving their profile.

By looking at this scenario it would probably make sense to include the company foreign key within the clustered index on the employees table and make it the first column of the index. Reason is that this will dramatically speed up the seek time of getting all the employees for a company since they will all be located around each other.

Now let me pull the reigns back a little. I’m not saying that you should go about doing this on every table in your database. There are very specific situations where this action makes sense and it doesn’t occur often. The only way to be completely sure is to load test your database in a testing environment with the change. Another HUGE WARNING! Doing this will cause the physical restructuring of data on the disk and as such it can take an incredibly long time to complete this change on large tables. Again, only testing can determine if a change like this is worth making.

5) Dropping, Rebuilding and Defragging Indexes and General Maintenance

When was the last time you rebuild or defragged your indexes?

Have you updated the statistics on the database lately?

How about checked the integrity of your data?

When was the last time you backed up your data?

Do you have any idea what I’m saying?

If your aren’t continually performing proper maintenance on your database, none of the ideas I talked about make any sense doing. Without proper maintenance your database will continue to degrade in performance no matter what you do. Almost all aspects of maintenance is automated and it’s so simple there’s no reason not to do it. MSSQL is especially easy since it has the Database Maintenance Plan wizard to guide you though it all. Check out the documentation the came with your database server to see what maintenance options or wizards it comes with.

Tagged with: , ,

3 Responses

Subscribe to comments with RSS.

  1. Ben Nadel said, on March 28, 2008 at 11:06 am

    @Tony,

    Clearly I know very little about indexing! Is there a good book you would recommend on this aspect of DB administration?

  2. rip747 said, on March 28, 2008 at 2:00 pm

    Actually databasejournal.com is one hell of a resource for all this type of stuff.

    check out the following page:

    http://www.databasejournal.com/features/article.php/3585206

    click on the link that corresponds to your database and it will show all the articles for it.

    another great resource is sqlservercentral.com:

    http://www.sqlservercentral.com/

    registration is free and they have some really great material and cover everything that’s going on with MSSQL.

    a great blog is the one Pinal Dave from SQLAuthority:

    http://blog.sqlauthority.com/

    this guy knows his shit!!!

    and finally how could we forget the man, the myth, the legend of all things sql, cf and bs…. ME🙂

    https://rip747.wordpress.com/category/sql/

  3. Keith said, on July 12, 2010 at 9:45 pm

    I think the term you are trying for is “DE”normalization. “Normalization” removing as much redundant data as possible, “denormalization” is putting that redundant data back in.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: