A little more than a year ago, I wrote a post explaining a way of refactoring stored procedures to reduce the number of similar procedures that differed only by the logic included in the WHERE clause…

 

Here’s my exact quote:

“Your codebase has a glut of repeated stored procedure logic for “get” or SELECT-based operations. Table JOINS, stored procedure parameters (for row filtering) and return types are similar among a focused set of stored procedures. The focused stored procedure group resembles the following pattern: GetEntityById, GetEntitiesByParentId, GetEntityByExactName, GetEntitiesByPartialName, GetEntityBySomeOtherParameterCombo.

Use the T-SQL COALESCE function in the procedure’s WHERE clause logic to implement optional parameters and combine the many similar, individual procedures into a single procedure with functionality for multiple input parameter combinations that yield a single return type.”

 

If implemented properly, the approach I detailed could significantly reduce the number of similar SELECT stored procedures in the database making your life easier as a developer having fewer procedures to maintain. I recently discussed this approach with our Head DBA and he had some reservations claiming that function usage in WHERE clauses could lead to performance issues. With a little more insight, I wanted to take an opportunity to evaluate my original procedure example and refactor it if I could find a better way.

 

Optimization Guidelines for WHERE Clauses

After an hour or so of research, AdventureWorks testing and several articles later (see Resources section at the end of this post), I’ve come up with a few guidelines to help squeeze the best performance out of your procedure execution – at least as far as your WHERE clause is concerned.

  • Know How to Read Your Execution Plan – There’s probably no better tool for performance feedback and optimization than SQL Server’s query execution plan viewer. This will present a visual representation of SQL Server’s query execution operations along with valuable metrics such as execution time, CPU cost, I/O cost, number of rows processed, etc. Seek out and destroy/minimize all Table or Index Scans!
  • Make Your WHERE Clause Columns Immediately Visible to SQL’s Query Optimizer – In other words, don’t bury your filter columns inside a function. Instead call-out the column followed by whatever function you need to complete the expression enabling the Query Optimizer to identify and use any indexes related to the column. Here’s a couple of examples: 
    • DO – WHERE Person.LastName LIKE ‘Beck%’
    • DON’T – WHERE SUBSTRING(Person.LastName, 5) = ‘Beck’
    • DO – WHERE Person.DOB > DATEADD(yyyy, –21, GETDATE())
    • DON’T – WHERE DATEDIFF(yyyy, Person.DOB, GETDATE()) > 21
  • Avoid Nested AND/OR Logic – There may be a creative way to eliminate the use of simple OR chains in your WHERE clauses. If there is, you should use it. While this isn’t a hard-and-fast rule, it is what I identified in my example procedure as something that could be optimized. By rethinking the expression, I was able to reduce the query cost of an index scan on Address.City. Aside from a performance increase, WHERE clause logic will be easier to read if your expressions are simpler. You should treat nested WHERE clause logic in your T-SQL the same way you should treat nested IF statements in your application code – eliminate it with extreme prejudice.

 

Let’s Review the Original Example

Below is the original stored procedure example in its entirety. According to the three guidelines above, the procedure has already called-out the filter column in the WHERE clause making it immediately visible to the Query Optimizer; I’ve evaluated the execution plan (actual output is below); and I’ve identified an ugly nested OR that can be cleaned up…

-- ==========================================================================================
-- Author:      Keith Beckman
-- Create date: 1/7/2011
-- Description: Selects complete address data based on a variety of input criteria.
-- ==========================================================================================
CREATE PROCEDURE [Person].[GetAddresses]
    @AddressId            INT = NULL
    , @PartialCity        NVARCHAR(30) = NULL
    , @StateProvinceId    INT = NULL
AS
BEGIN
 
    SET NOCOUNT ON;
 
    SELECT TOP 20    
        A.AddressID
        , A.AddressLine1
        , A.City
        , A.PostalCode
        , SP.Name
        , CR.Name
    FROM    Person.Address A
            INNER JOIN Person.StateProvince SP ON SP.StateProvinceID = A.StateProvinceID
            INNER JOIN Person.CountryRegion CR ON CR.CountryRegionCode = SP.CountryRegionCode
    WHERE    A.AddressID        = COALESCE(@AddressId, A.AddressID)
        AND A.StateProvinceID   = COALESCE(@StateProvinceId, A.StateProvinceID)
        AND((@PartialCity IS NULL AND A.City = A.City) OR A.City LIKE '%' + @PartialCity + '%')
END

GO

 

Below is the SQL Server actual query plan results for the following procedure call…

--Get all addresses with a city name like "Birm"...
--Replaces: GetAddressesByCityName
EXECUTE AdventureWorks.Person.GetAddresses 
  @PartialCity = 'Birm'
GO

 

image  image

 

One thing to note, using the AdventureWorks database is a contrived example and due to the database size any optimizations will be unnoticeable to the naked eye. However, when you’re working with extremely large tables in a high traffic DB, minor enhancements like the following can make a big difference. Notice the Index Scan above. At 65% of the entire query cost, this could be a big bottleneck under the right conditions.

 

Let’s Tweak the WHERE Clause

WHERE    A.AddressID       = COALESCE(@AddressId, A.AddressID)
   AND A.StateProvinceID   = COALESCE(@StateProvinceId, A.StateProvinceID)
   AND A.City              LIKE '%' + COALESCE(@PartialCity, A.City) + '%'

 

As I mentioned earlier, all we really needed to do for this query is optimize the WHERE clause filter for the Address.City column. Instead of the nested OR that evaluates whether or not the @PartialCity parameter IS NULL and then builds a LIKE expression, I made the entire check a LIKE expression that was immediately aware of the target column, Address.City. Executing the same procedure call against the optimized procedure produced the execution plan results below.  As you can see from the Clustered Index Scan statistics (previously the most expensive part of the query as defined by Cost %) the Estimated I/O Cost and Estimated CPU Cost figures were reduced dramatically. I didn’t eliminate the Clustered Index scan completely, but it is a significantly less percentage of the overall execution cost. The NonClustered Index Seek operation increased by a factor of 6, but since this is always a faster operation than a Scan, it’s more efficient for the overall query execution.

image   image

 

In Closing

There’s always room for refactoring in every line of code you write. Sometimes it will be a year or more before your original code becomes a problem. Or it could take that long for you to circle back around to it… Check your ego at the door and question everything. No matter how good you think you are or how good you really are, you’ve written code that can be improved for both performance and readability.

 

Resources

The Cost of Function Use in a WHERE Clause

Avoid SQL Server Functions in the Where Clause for Performance

Execution Plan Basics