Rip's Domain

To “SELECT *” or not. It’s a matter of opinion.

Posted in SQL by rip747 on July 10, 2006

There has always been this rift in the CF community that has bothered me since I typed my first SQL statement into a <cfquery>, whether or not to use “SELECT *” to perform a query of all columns on a table. A lot of people have expressed their opinions from time to time with some really qualified answered and some not so qualified answers. Here are two of the usual ones I come across:

1) By naming the columns you get better performance.
2) It makes the code harder to read.

I have problems with both these answers; let’s examine them. The first one obviously is something that if you’re going to say, you need to back it up with some sort of SQL profile or SQL trace to see what is going on in your database world. Simply stating that it gives you better performance means nothing. So many people make broad claims like this and they have nothing to back it up. Now I will tell you that selecting all the columns in a table when you are only going to use two is a waste of network chatter and database performance. Is it enough to bring down your entire web site? I doubt it unless your web site is getting massive amounts of traffic (I’ll talk about this in a later blog post). However we’re not talking about just selecting two rows here, are we? What we want to know is by using “SELECT *” if we decrease performance on our database server. Well for that we will need to use trusty old SQL Query Analyzer (SQA) and run some queries against a database. I’ll be using [Northwind].[Customers] table for this article since it is installed with SQL Server by default.

select_all_ep.jpg

select_rows_ep.jpg

AS you can see from the Execution Plans for both queries, there is no difference between the two. They both have the exact same I/O and CPU costs.

select_all_trace.jpg

select_rows_trace.jpg

Not much of a difference here at all. Both queries generated 19 reads to the disk. This is really nothing surprising since we got this same exact information from the Execution Plan before.

So as of right now we have seen that both queries affect the CPU and hard drive the same, but what about network traffic? Let’s see below.

select_all_stats.jpg

select_rows_stats.jpg

AH-HA!!!! There is a difference!!! As you can see from the statistics that network traffic by using “SELECT *” is lower than the calling all the columns. To be honest, I expected this. Since you’re sending more text across the pipe, you’re obviously going to have more network traffic. Really this shouldn’t be looked at, but I wanted to show you that, yes, there is a difference some where. Is it anything to be concerned about? Not at all.

So as you can see that by using the Execution Plan, Trace and Statistics displays of SQA there is no difference between using “SELECT *” and writing out the column names, NOTHING, NATHAN, NADA, ZILTCH, ZIP, ZERO. So…. like I said in the blog comment, the only reason people say this is to express their own opinion and nothing else. There is no evidence to back it up.

On to the next answer 2!

This is TOTALLY a matter of opinion and I can’t even give screenshots for this and with that I will give my opinion: TYPE OUT YOUR COLUMN NAMES. I can’t tell you how many times I’ve looked back at a “SELECT *” query I wrote months earlier only to curse myself because I didn’t remember what columns were in the table. Especially now that a lot of people put their queries in CFCS, it more important than ever to type out all your column names. But like everything else in this article, it’s a matter of opinion.

10 Responses

Subscribe to comments with RSS.

  1. Kurt Wiersma said, on July 11, 2006 at 11:04 pm

    I heard that if you use select * on SQL Server 2000 index are not used. I wonder if you like a slightly more completed query on a larger DB if you would actually see more of a difference.

  2. jared chandler said, on July 12, 2006 at 3:09 am

    select * can cause problems with certain queries on coldfusion, usually involving views, where an underlying column is added or removed, but because of the way CF handles connectors / JDBC, the change isn’t reflected in the query until you restart the service or hup the connection to the DB.

    selecting * vs explicit columns doesn’t really make a difference as to performance (on SQL Server) unless there are some BLOB or Text fields in there.. Checkout O’reilly’s SQL Performance Tuning. Best book ever!

  3. Adam Cameron said, on July 12, 2006 at 4:14 am

    I think the issue is not so much when one actually WANTS “all the columns”, it’s when – in reality – one actually only wants a list of IDs or maybe a label value or something, and lazy people have the inclination to not think their requirements through and simply go SELECT * instead (“just in case”).

    For example, as soon as one starts (needlessly) adding LOB type data to a record set that actually only needs the IDs that one is going to see quick a major degradation of CPU, RAM and bandwidth.

    Also… how often does one actually want ALL the columns from the table? I should think this is the exception rather than the rule. And how often – ahead of time – know that one wants ALL the columns from the table, including any additional columns added down the track?

    The performance hit for specifying the columns one wants (even if it’s all the columns the table currently has) is neglibible – as you demonstrate – compared to “accidentally” grabbing a bunch of columns one doesn’t actually need.

    My standpoint here is that one should work out the precise current requirement for the situation, and fulfil that requirement.

    If one designs one’s application carefully, the maintenance coding of adding new DB columns to existing SQL statements should be minimal. And think about it… if the column is NEEDED by your application… you’re going to have to add code use the column anyhow. If it’s NOT needed: why are you retrieving it?

    In summary… I think we are going to have to agree to disagree on the usage of SELECT *. I don’t think it’s an appropriate solution for many situations at all, and should be discouraged as a practice.


    Adam

  4. Trond Ulseth said, on July 12, 2006 at 5:04 am

    Hi Tony,

    I’ve been watching this discution from the sideline for a long time, and it seems to me that the general opinion is that select * is bad.

    I however continue to use it, specially in the read method of my CRUD objects. Why? Simply because the task of this method is to return one complete (in my opinion) record from a db. And I like to know that wherever in my application I need the data from this method I won’t have to worry if the collumns I need there and then are present in the query or not. I could initially list up all the collumns in the select statement, but then I’d have to remember to update the CRUD everytime there’s a change in the DB, which happens now and then.

    For Gateway methods the situation is slightly different though. If I for example need a list of firstnames, I don’t like the idea of returning houndreds or even thousands of full recordsets, with a BLOB fields portraying life story of each person and whatever else. And it could get even uglier with joined selects.

    My 2c on the subject

  5. Tony Petruzzi said, on July 12, 2006 at 8:45 am

    @Kurt

    Indexes are used. From the screen shot you can see that the query is using a clustered index lookup.

    @jared

    You are absolutly correct in this: using SELECT * can cause problems with queries in CF because of the caching in the JDBC driver. That is why I don’t use them.

    @adam

    Your point are very well though out and excellently explained. I agree with you totally. Great comment.

    @trond

    I don’t feel that using SELECT * in a DAO or CRUD CFC is a good idea. As Jared pointed out, the JDBC driver will cache the table columns and if you add a column to the underline table, it will cause problems. But like I’ve said before, it’s your code so do with it what you wish.🙂

    @everyone

    By the way, the only solution I have found for the caching issue with the JDBC driver is to either a) restart the CF service, which is a pain. b) Make sure that “Maintain connections across client requests.” is uncheck for the DSN in the CF administrator.

    If anyone knows how to do something like this through CF using java objects to clear the cache, please let me know. I haven’t had the chance yet to try to figure this out.

  6. Trond Ulseth said, on July 13, 2006 at 4:17 am

    The JDBC driver cache argument is the best one so far against using select *. Actuallly so good that I might have to rethink my opinion.

    We learn as we live🙂

  7. Tony Petruzzi said, on July 13, 2006 at 10:53 am

    @trond

    Could you elaborate on how to use this argument to clear the cache? Are you able to do this while CF is up and running and could you do this through CF code calling java objects?

    Very interesting comment BTW.

  8. Pierre said, on August 25, 2006 at 12:38 pm

    @tony
    Didn’t you get performance issue when unchecking “Maintain connections across client requests.” ?

    I am also wondering to do it or not as I have exeperienced “Error Executing Database Query.Timed out trying to establish connection …” because I limit connections and it seems cfmx7 has an issue in cleaning old inactive sessions…

  9. rip747 said, on August 27, 2006 at 2:32 pm

    @pierre

    Obviously with opening and closing the connections, you will get a performance hit, how big depends on the traffic of the site. Are you personally going to see it, I doubt it.

    I don’t think I mentioned it, but the best way to over come the caching issue with the JDBC drivers is to type out all of the column names and don’t use SELECT *. This thread has some good details on this problem:

    HOUSE OF FUSION

  10. […] all started with my first technical […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: