Ben started a whirl wind of ideas with his recent post about optimizing SQL queries for better performance. I was getting too carried away with commenting and decided that further tips that I had should be put down in my own post. Below are the ideas that I commented on, along with some new ones.

1) Create indexes on foreign key(s)

I think, no wait, I know that this is the most overlooked design aspect on almost every database. Far too often people forget to create an index for the foreign key(s) that are on a table. I guess the reason is because since all database servers automatically create an index on the columns that you specify to make up the primary key, why shouldn’t it do the same for foreign keys? Well it doesn’t and this is why it’s the most overlooked and biggest culprit of performance lost in a database.

2) Normalization

How many times have you seen a statement that joins 10 different tables all to grab a single value from each table? I’m so guilty of this, I should be thrown to the wolves. Again this is a HUGE performance hog and can be avoided by sitting down and taking the time to normalize your database. What is normalization you ask? It’s the idea that some data that is in one table can be copied into another table to prevent a join from occurring. I guess an example I can give is:

You have 2 tables named employees and companies. Companies has an one to many relation to Employees. When you write a query to retrieve an employee’s company name, you create a join between the employees table and the companies table and retrieve the company name.

It’s simple and we all do it and it’s probably not the greatest example, but I wanted something simple. Now if you wanted to normalize this, you would create a column on the employees table called company_name and copy the value from the companies table into that column, thus preventing you from have to create a join when retrieving the company name in a query which will improve the performance of the query. This can be accomplished at the application level or by using triggers within the database.
Now I wouldn’t use normalization in this situation, again I wanted something simple to explain the concept. With that said, when should you start and how do you determine what to normalize all depends on the data and the time the queries are taking to retrieve it. Most of time though I only normalize simple data that isn’t changed too often. For that stuff I use views.

3) Views and Indexed Views

Yes they’re different and indexed views aren’t supported on all database platforms so this doesn’t pertain to everyone.

Normalization is tough to deal with and maintain. So how do we go about increasing the performance of our database with copying columns and data everywhere, we use views!

Views are basically queries that you can save and then use like normal table when writing other queries. The idea is that instead of writing the same sub-queries or derived tables all the time and throughout your application, you can move those statements into views and then interact with them as you would with any other table.

Views are a great place to house the business logic of your application when you don’t want to or can’t use stored procedures. I use them a lot in applications to calculate expiration dates of memberships, totaling up line items in a shopping cart or just about any other calculations that the application needs.

Imagine trying to maintain a set of complicated calculations and business logic in an application that is copied over and over in queries scattered everywhere? You can’t!

Now I understand that most people argue that calculations like these should be moved into classes with the application, but I’ve found that that’s not always a smart thing to do and can sometimes cause bigger performance lost then realized. Imagine if you would, you have a method in your application that calculates the expiration date of a member and you need to access to this information at the database level. Well if the calculation is performed at the application level, that means you will need to copy and translate the logic to your database. By moving this calculation to a view you now have access to the information from both your database and the application and it’s all in one place! Also now since the calculation is being performed at the database level it makes you write less code in your application and not have to calculation for each record you return.

Ok great… so what is an indexed view? An index view is the basically the same thing as a regular view only you have the option of creating indexes on it where as a regular view you can’t. This can greatly increase the performance of queries that use views or span across multiple views. BE FOREWARNED though there are STRICT guidelines that you must follow in the creation and use of these views and they differ between database servers. You must consult your database servers documentation when attempting to use them.

Examples on using and creating indexed views in MSSQL 2000 can be found here.

4) Clustered Indexes

At the beginning of this article I mentioned that placing indexes on foreign key can be a big help and that almost all database server automatically create indexes for primary keys. Well expanding on that is the use of clustered indexes versus regular indexes.

When you create a regular old index on a table your database server basically creates a file on the server with information about where everything is within that table. Nothing happens to the data within the original table. Not so when you create a clustered index. A clustered index tells the database server how to physically write the data for this table onto the server’s disk. This is the reason why you can only have one clustered index on a table and it’s THE MOST important decision you can make when talking about performance.

Remember the advice at the beginning about placing index on foreign keys? Well to expand on that, you should also determine if your foreign key should be used as or be part of the clustered index on the table.

Let’s look back at the scenario I gave between the employees and the companies. After we’ve written our application, we noticed that throughout our application, there are many times when we query to list or find employees that are part of a particular company. Further investigation reveals that the only time we’re ever querying the employees table directly is when We’re authenticating a login or retrieving their profile.

By looking at this scenario it would probably make sense to include the company foreign key within the clustered index on the employees table and make it the first column of the index. Reason is that this will dramatically speed up the seek time of getting all the employees for a company since they will all be located around each other.

Now let me pull the reigns back a little. I’m not saying that you should go about doing this on every table in your database. There are very specific situations where this action makes sense and it doesn’t occur often. The only way to be completely sure is to load test your database in a testing environment with the change. Another HUGE WARNING! Doing this will cause the physical restructuring of data on the disk and as such it can take an incredibly long time to complete this change on large tables. Again, only testing can determine if a change like this is worth making.

5) Dropping, Rebuilding and Defragging Indexes and General Maintenance

When was the last time you rebuild or defragged your indexes?

Have you updated the statistics on the database lately?

How about checked the integrity of your data?

When was the last time you backed up your data?

Do you have any idea what I’m saying?

If your aren’t continually performing proper maintenance on your database, none of the ideas I talked about make any sense doing. Without proper maintenance your database will continue to degrade in performance no matter what you do. Almost all aspects of maintenance is automated and it’s so simple there’s no reason not to do it. MSSQL is especially easy since it has the Database Maintenance Plan wizard to guide you though it all. Check out the documentation the came with your database server to see what maintenance options or wizards it comes with.

While browsing Reddit, I came across an article about why you should index your database.

In short, the author finally figures out that using an index can decrease the execution time of a query dramatically. An incredibly uninformative article, since there is no mention of what he indexed or any advice as how you should go about doing it yourself.

Well being a database guy at heart I thought that I would give some hints in this subject since it’s a fairly common one. Many a times in my career I’ve been asked by people about what they should be indexing in their databases. I don’t know why, but people seem scared to place an index on a table. Let me just tell you that you aren’t going to destroy your database by experimenting with indexes, you might slow it down a bit, but you won’t mess up any data. So don’t be afraid to start playing around.

So what should we index? As a rule of thumb you should be indexing the following:

  1. Any foreign keys linking table together (you will be amazed that this is the most overlooked index to create)
  2. Any columns that you will use to create joins between table (usually these are the same as the foreign keys, but they could differ sometimes)
  3. Any columns that you will be filtering against in your queries (these are columns in your queries that you reference in your where clauses)

By going down those 3 items, you will most likely see a performance gain in your database. Now there are some gotchas with this:

  1. Be careful to not index a column more then once. This could be common if you place an index on a foreign key and then include that foreign key within another index. There are obviously exceptions to the rule.
  2. You don’t want to place an index on every column in a table even if you do use even column in where clauses throughout your application. By doing so you will be killing the performance of the table. Try to use judgment in selecting the most important columns within the table.

Can you have too many indexes on a table? Of course you can! One of the ways that you know that you have too many indexes is if you see a lag when altering data on a table (inserting, updating and deleting). Remember that your database server has to maintain and do housecleaning on all indexes affected by altering the data on a table. Because of this, if you place too many indexes on a table, it will slow down the write and updates of the indexes which will cause lags.

For people just getting started with indexes and using SQL Server, a great way to learn and practice is to use the Index Tuning Wizard that is included with SQL Server. To practice, place a slow running query or a query that you want to optimize in Query Analyzer and execute it. Next, try figuring out on your own which columns on which tables you should index in order to make the query perfomrm better and write them down. Run the Index Tuning Wizard and see if your suggestions matches the ones that the Index Tuning Wizard comes up with.

Good luck optimizing your tables with indexes. If you have any questions, suggestions or comments, leave them below.

This is funny. I just heard about Microsoft’s Skydrive and thought that I would give it a try. Skydrive is similar to Box.net which I have an account with. In my opinion if someone is going to give you free online storage and… take it!!! Why use your own bandwidth to share your porn files with your friends and others.

So as I’m signing up, I’m asked to create a Windows Live ID. Now you can create a new idea by opening a HotMail account or use an existing email address. I figured that since I have a GMail account, I would use that.. WRONG!!!! Seems that Microsoft doesn’t like the fact that your using a rival service, so they force to enter an alternative email address since GMail is flagged as a “reserved domain”, which means the domain is banned.

Click here for the proof!!!