Blog of Keith Craig, containing my thoughts, comments and questions. RSS Feed


Roughly 3% penalty for indexing SqlDataReader columns by string rather than int

When I was first learning .NET, I was told that indexing the columns of a SqlDataReader using an int was MUCH faster than using the string indexer. After I got over my initial horror (the string indexer was so much easier to read!), I decided that if this was the way it was, at least I could use an enum to enumerate the columns of a resultset.

However, I recently decided to test this. I wrote the following simple code to test it, using the pubs database. (You can paste this code into a new console app project and it should work if you have pubs on the local default SQL Server instance.)

using System;

using System.Collections.Generic;

using System.Data;

using System.Data.SqlClient;

 

namespace ReaderIndexerTest

{

    class Program

    {

        static void Main(string[] args) {

            DateTime start = DateTime.Now;

 

            for (int i = 1; i < 1000 ; i++ )

                IntIndexer();

 

            DateTime end = DateTime.Now;

            TimeSpan duration = end.Subtract(start);

 

            Console.WriteLine(duration.TotalMilliseconds.ToString());

            Console.ReadKey();

        }

 

        static string ConnStr {

            get {

                return "server=(local);database=pubs;Trusted_Connection=Yes;";

            }

        }

 

        /// <summary>

        /// Use integers to index the columns of a DataReader

        /// </summary>

        static void IntIndexer() {

            List<Author> list = new List<Author>();

 

            SqlConnection conn = new SqlConnection(ConnStr);

            conn.Open();

            SqlCommand cmd = new SqlCommand("select * from authors", conn);

            using (SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.CloseConnection)) {

                while (reader.Read()) {

                    Author a = new Author();

 

                    a.au_id = (string)reader[0];

                    a.au_lname = (string)reader[1];

                    a.au_fname = (string)reader[2];

                    a.phone = (string)reader[3];

                    a.address = (string)reader[4];

                    a.city = (string)reader[5];

                    a.state = (string)reader[6];

                    a.zip = (string)reader[7];

                    a.contract = (bool)reader[8];

 

                    list.Add(a);

                }

                reader.Close();

            }

        }

 

        /// <summary>

        /// Use strings to index the columns of a DataReader

        /// </summary>

        static void StringIndexer() {

            List<Author> list = new List<Author>();

 

            SqlConnection conn = new SqlConnection(ConnStr);

            conn.Open();

            SqlCommand cmd = new SqlCommand("select * from authors", conn);

            using (SqlDataReader reader = cmd.ExecuteReader(CommandBehavior.CloseConnection)) {

                while (reader.Read()) {

                    Author a = new Author();

 

                    a.au_id = (string)reader["au_id"];

                    a.au_lname = (string)reader["au_lname"];

                    a.au_fname = (string)reader["au_fname"];

                    a.phone = (string)reader["phone"];

                    a.address = (string)reader["address"];

                    a.city = (string)reader["city"];

                    a.state = (string)reader["state"];

                    a.zip = (string)reader["zip"];

                    a.contract = (bool)reader["contract"];

 

                    list.Add(a);

                }

                reader.Close();

            }

        }

    }

 

    /// <summary>

    /// Quick and dirty class to hold data to simulate object population

    /// </summary>

    class Author

    {

        public string au_id;

        public string au_lname;

        public string au_fname;

        public string phone;

        public string address;

        public string city;

        public string state;

        public string zip;

        public bool contract;

 

        // For debugging

        public override string ToString() {

            return au_id + "\t" + au_lname + "\t" + au_fname + "\t" + phone;

        }

    }

}

 

To simulate the overhead of actually reading the data, this code calls the database 1000 times, gets the entire contents of the dbo.authors table, and builds a list of objects using the data. In order to switch which indexer method is used, edit the method called by the for loop in Main()—use either IntIndexer() or StringIndexer().

When I ran it, here's what I found on my machine:

In other words, the string indexer was about 3% or so slower than the int indexer, including the query overhead. (I purposely choose a simple query with a reasonable amount of data, so that the database execution time would not dominate the execution time.) Note that there may be other combinations of resultset columns, rows, data, etc that return slightly different results, but this query is similar to the types of resultset I often include in applications. (Well, actually it's about the same amount of data, but it's a much simpler query.)

For me at least, the string indexer is clearly a winner. It is MUCH easier to read, much more resilient to refactorings, and has nearly as good performance. And I've learned my lesson—rather than believing performance folklore, it's important to actually do some simple tests and see if there's a significant difference in performance.

There may be some applications where 3% or so makes a huge difference, but for the types of business applications I usually write, that performance gain is not worth the loss of code readability and maintainability, especially when I have nearly a thousand queries that all need to be maintained. When performance becomes an issue, I'll optimize the problematic section of code (and query), rather than optimize all the readers prematurely.

 
Posted by Keith Craig | 4 Comments | Trackback Url | Bookmark with:        
Tags:

Links to this Post

Comments

Monday, 4 Jun 2007 05:46 by 3% is a lot!
Given that the round-trip to the database is part of the execution path, any query, even one that returns a static result set, will *always* dominate the execution time. SQL Server is in a separate process and just the context switch alone is expensive. That you were able to produce a small but noticeable difference in code that includes such a huge factor means integer indexing is much faster than string indexing. This does not mean I suggest using integer indexing. In fact, I recommend always using string indexing because unless you care about 3% (and that will be so in less than 3% of the cases), the gain in readability (i.e., maintainability) offsets the performance loss by a huge margin.

Tuesday, 5 Jun 2007 12:13 by Re: Roughly 3% penalty for indexing SqlDataReader columns by string rather than int
For those obsessive developers that really want that 3%, you could call GetOrdinal on the DataReader after opening it to convert string indexes to int indexes and store them in well named int variables, then use the int variables inside the loop. Still readable and resilient to change but that would just be silly. :) -- Jason Stangroome www.codeassassin.com

Thursday, 14 Jun 2007 04:45 by Roughly 3% penalty for indexing SqlDataReader columns by string rather than int
I'd suggest that both ways the column has been indexed here aren't the best approach. The first isn't great because the ordinal of particular columns could change, the second isn't great because you're hard coding strings... something that would strike me as a bad move from a maintenance point of view. The strings have to be somewhere though, and I would suggest private constants of the class. Then you can lookup the column ordinals once by that string and get that extra %-age of performance while having maintainable code. // 'au_id'... Pascal case columns names *please*... private const string IdColumn = "au_id"; private const string FirstNameColumn = "au_fname"; ... void Method() { ... int idOrdinal = reader.GetOrdinal(IdColumn); ... while (reader.Read()) { author.Id = reader.GetString(idOrdinal); } } Oh, also, you do know there are strongly typed methods on the DbDataReader class to read values don't you??

Thursday, 28 Jun 2007 05:00 by Reply
> The strings have to be somewhere though, and I would suggest private constants of the class. Then you can lookup the column ordinals once by that string and get that extra %-age of performance while having maintainable code. Thanks for the reply—this is an interesting idea, and for certain projects, I think what you are suggesting is a good idea. However, there's a tradeoff here: using private variables to track the ordinal columns makes the code robust against renaming of columns, but it requires more code, both to write and to maintain. This type of decision often depends on the methodology and team organization of the project. If you have a separate team in charge of the database, then I think this approach may very well be worthwhile. However, in the case of our project, the same cross-functional team manages the code and the database schema, so the column names hardly ever change. In this case, I don’t see much value in it. Also, in a web application, depending on your architecture, you may have to go out of your way to have stateful datalayer object instances that survive between requests. Otherwise, you pay the lookup cost on every request. This would actually be slower than indexing by a constant string in the case of single-row resultsets. > Oh, also, you do know there are strongly typed methods on the DbDataReader class to read values don't you?? Yes, I do. =)

Name:
URL:
Email:
Comments:

CAPTCHA Image Validation