WHERE Clause Functions: A Word of Caution…

04.23.2012 22:32 by kbeckman | Comments

A little more than a year ago, I wrote a post explaining a way of refactoring stored procedures to reduce the number of similar procedures that differed only by the logic included in the WHERE clause…

 

Here’s my exact quote:

“Your codebase has a glut of repeated stored procedure logic for “get” or SELECT-based operations. Table JOINS, stored procedure parameters (for row filtering) and return types are similar among a focused set of stored procedures. The focused stored procedure group resembles the following pattern: GetEntityById, GetEntitiesByParentId, GetEntityByExactName, GetEntitiesByPartialName, GetEntityBySomeOtherParameterCombo.

Use the T-SQL COALESCE function in the procedure’s WHERE clause logic to implement optional parameters and combine the many similar, individual procedures into a single procedure with functionality for multiple input parameter combinations that yield a single return type.”

 

If implemented properly, the approach I detailed could significantly reduce the number of similar SELECT stored procedures in the database making your life easier as a developer having fewer procedures to maintain. I recently discussed this approach with our Head DBA and he had some reservations claiming that function usage in WHERE clauses could lead to performance issues. With a little more insight, I wanted to take an opportunity to evaluate my original procedure example and refactor it if I could find a better way.

 

Optimization Guidelines for WHERE Clauses

After an hour or so of research, AdventureWorks testing and several articles later (see Resources section at the end of this post), I’ve come up with a few guidelines to help squeeze the best performance out of your procedure execution – at least as far as your WHERE clause is concerned.

  • Know How to Read Your Execution Plan – There’s probably no better tool for performance feedback and optimization than SQL Server’s query execution plan viewer. This will present a visual representation of SQL Server’s query execution operations along with valuable metrics such as execution time, CPU cost, I/O cost, number of rows processed, etc. Seek out and destroy/minimize all Table or Index Scans!
  • Make Your WHERE Clause Columns Immediately Visible to SQL’s Query Optimizer – In other words, don’t bury your filter columns inside a function. Instead call-out the column followed by whatever function you need to complete the expression enabling the Query Optimizer to identify and use any indexes related to the column. Here’s a couple of examples: 
    • DO – WHERE Person.LastName LIKE ‘Beck%’
    • DON’T – WHERE SUBSTRING(Person.LastName, 5) = ‘Beck’
    • DO – WHERE Person.DOB > DATEADD(yyyy, –21, GETDATE())
    • DON’T – WHERE DATEDIFF(yyyy, Person.DOB, GETDATE()) > 21
  • Avoid Nested AND/OR Logic – There may be a creative way to eliminate the use of simple OR chains in your WHERE clauses. If there is, you should use it. While this isn’t a hard-and-fast rule, it is what I identified in my example procedure as something that could be optimized. By rethinking the expression, I was able to reduce the query cost of an index scan on Address.City. Aside from a performance increase, WHERE clause logic will be easier to read if your expressions are simpler. You should treat nested WHERE clause logic in your T-SQL the same way you should treat nested IF statements in your application code – eliminate it with extreme prejudice.

 

Let’s Review the Original Example

Below is the original stored procedure example in its entirety. According to the three guidelines above, the procedure has already called-out the filter column in the WHERE clause making it immediately visible to the Query Optimizer; I’ve evaluated the execution plan (actual output is below); and I’ve identified an ugly nested OR that can be cleaned up…

-- ==========================================================================================
-- Author:      Keith Beckman
-- Create date: 1/7/2011
-- Description: Selects complete address data based on a variety of input criteria.
-- ==========================================================================================
CREATE PROCEDURE [Person].[GetAddresses]
    @AddressId            INT = NULL
    , @PartialCity        NVARCHAR(30) = NULL
    , @StateProvinceId    INT = NULL
AS
BEGIN
 
    SET NOCOUNT ON;
 
    SELECT TOP 20    
        A.AddressID
        , A.AddressLine1
        , A.City
        , A.PostalCode
        , SP.Name
        , CR.Name
    FROM    Person.Address A
            INNER JOIN Person.StateProvince SP ON SP.StateProvinceID = A.StateProvinceID
            INNER JOIN Person.CountryRegion CR ON CR.CountryRegionCode = SP.CountryRegionCode
    WHERE    A.AddressID        = COALESCE(@AddressId, A.AddressID)
        AND A.StateProvinceID   = COALESCE(@StateProvinceId, A.StateProvinceID)
        AND((@PartialCity IS NULL AND A.City = A.City) OR A.City LIKE '%' + @PartialCity + '%')
END

GO

 

Below is the SQL Server actual query plan results for the following procedure call…

--Get all addresses with a city name like "Birm"...
--Replaces: GetAddressesByCityName
EXECUTE AdventureWorks.Person.GetAddresses 
  @PartialCity = 'Birm'
GO

 

image  image

 

One thing to note, using the AdventureWorks database is a contrived example and due to the database size any optimizations will be unnoticeable to the naked eye. However, when you’re working with extremely large tables in a high traffic DB, minor enhancements like the following can make a big difference. Notice the Index Scan above. At 65% of the entire query cost, this could be a big bottleneck under the right conditions.

 

Let’s Tweak the WHERE Clause

WHERE    A.AddressID       = COALESCE(@AddressId, A.AddressID)
   AND A.StateProvinceID   = COALESCE(@StateProvinceId, A.StateProvinceID)
   AND A.City              LIKE '%' + COALESCE(@PartialCity, A.City) + '%'

 

As I mentioned earlier, all we really needed to do for this query is optimize the WHERE clause filter for the Address.City column. Instead of the nested OR that evaluates whether or not the @PartialCity parameter IS NULL and then builds a LIKE expression, I made the entire check a LIKE expression that was immediately aware of the target column, Address.City. Executing the same procedure call against the optimized procedure produced the execution plan results below.  As you can see from the Clustered Index Scan statistics (previously the most expensive part of the query as defined by Cost %) the Estimated I/O Cost and Estimated CPU Cost figures were reduced dramatically. I didn’t eliminate the Clustered Index scan completely, but it is a significantly less percentage of the overall execution cost. The NonClustered Index Seek operation increased by a factor of 6, but since this is always a faster operation than a Scan, it’s more efficient for the overall query execution.

image   image

 

In Closing

There’s always room for refactoring in every line of code you write. Sometimes it will be a year or more before your original code becomes a problem. Or it could take that long for you to circle back around to it… Check your ego at the door and question everything. No matter how good you think you are or how good you really are, you’ve written code that can be improved for both performance and readability.

 

Resources

The Cost of Function Use in a WHERE Clause

Avoid SQL Server Functions in the Where Clause for Performance

Execution Plan Basics

T-SQL Sproc Optional Parameters Using COALESCE

01.26.2011 22:42 by kbeckman | Comments

Overview

Your codebase has a glut of repeated stored procedure logic for “get” or SELECT-based operations. Table JOINS, stored procedure parameters (for row filtering) and return types are similar among a focused set of stored procedures. The focused stored procedure group resembles the following pattern: GetEntityById, GetEntitiesByParentId, GetEntityByExactName, GetEntitiesByPartialName, GetEntityBySomeOtherParameterCombo.

 

Use the T-SQL COALESCE function in the procedure’s WHERE clause logic to implement optional parameters and combine the many similar, individual procedures into a single procedure with functionality for multiple input parameter combinations that yield a single return type.

 

Motivation

I’m currently working in a codebase with an overwhelming amount of T-SQL stored procedures. There’s an individual stored procedure for every CRUD operation you can think of… Unfortunately, with this particular application, it doesn’t make a lot of sense to start swapping out stored procedures and ADO.NET code for a more sexy ORM. The existing data access patterns already work well and all of the developers are very familiar with it. In addition, there are areas of the codebase where the domain model is coupled a bit too tightly to the data access code. The refactoring effort to introduce an ORM would be more than the deadlines allow for. An alternative is to reduce the number of similar “get” stored procedures into a single procedure that provides all of your required functionality. This approach can significantly impact the overall procedure count and make both database management and stored procedure enhancement easier .

 

Creating a single stored procedure for all of your “get” operations is possible with WHERE clause usage of the T-SQL COALESCE function. COALESCE takes an expression list and applies the first item that is NOT NULL. The expression list allows for the addition of stored procedure parameters as well as column values. Initializing all of your optional parameters = NULL in the parameter declaration will initialize the parameter as NULL if a value is not given by a call to the procedure. Apply a filter in the WHERE clause using the COALESCE function that uses your optional parameter and provide an alternate expression (if your parameter evaluates to IS NULL) that will always logically evaluate to TRUE. This will make sense when you evaluate the example…

 

Mechanics

1) Create a new stored procedure that will replace all of the existing “get” procedures for a given entity. Name it with a generic name that doesn’t indicate what parameter combination is required – GetEntity.

2) Add all of the possible input parameters to the new procedure. All required parameters should be declared as you normally would declare them. All optional parameters should be initialized as NULL in the declaration.

 

CREATE PROCEDURE dbo.GetEntity
    @EntityId            INT = NULL
    , @PartialEntityName NVARCHAR(30)
    , @ParentEntityId    INT = NULL
AS
BEGIN
...

3) For all optional filtering parameters, use the T-SQL COALESCE keyword to build an expression in the WHERE clause that first uses your optional parameter if it is NOT NULL and some other filter value(s) if your optional parameter IS NULL.

 

WHERE SomeTable.EntityId            = COALESCE(@EntityId, SomeTable.EntityId)
    AND SomeTable.ParentEntityId    = COALESCE(@ParentEntityId, SomeTable.ParentEntityId)
    AND((@PartialEntityName IS NULL AND SomeTable.EntityName = SomeTable.EntityName) 
        OR SomeTable.EntityName LIKE '%' + @PartialEntityName + '%')

 

4) Replace the usage of an existing “get” procedure with your new stored procedure and test its behavior.

5) Make sure any related unit tests are passing.

6) Repeat steps 4 and 5 for any usages of your old “get” stored procedures.

7) Be sure to tune/update any existing indexes or create additional indexes to accommodate your search criteria.

 

Example

This example uses the AdventureWorks sample DB in SQL 2008.

 

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- ==========================================================================================
-- Author:      Keith Beckman
-- Create date: 1/7/2011
-- Description: Selects complete address data based on a variety of input criteria.
-- ==========================================================================================
CREATE PROCEDURE Person.GetAddresses
    @AddressId            INT = NULL
    , @PartialCity        NVARCHAR(30)
    , @StateProvinceId    INT = NULL
AS
BEGIN

    SET NOCOUNT ON;

        SELECT TOP 20    
        A.AddressID
        , A.AddressLine1
        , A.City
        , A.PostalCode
        , SP.Name
        , CR.Name
    FROM    Person.Address A
            INNER JOIN Person.StateProvince SP ON SP.StateProvinceID = A.StateProvinceID
            INNER JOIN Person.CountryRegion CR ON CR.CountryRegionCode = SP.CountryRegionCode
    WHERE    A.AddressID             = COALESCE(@AddressId, A.AddressID)
            AND A.StateProvinceID    = COALESCE(@StateProvinceId, A.StateProvinceID)
            AND((@PartialCity IS NULL AND A.City = A.City) OR A.City LIKE '%' + @PartialCity + '%')
END
GO

 

Here’s some usage examples to get an idea of what’s possible with the dynamic input parameter stored procedure approach…

 

--Get all addresses with a city name like "Birm"...
--Replaces: GetAddressesByCityName
EXECUTE AdventureWorks.Person.GetAddresses 
  @PartialCity = 'Birm'
GO

--Get all addresses with a city name like "Birm" that are in Alabama...
--Replaces: GetAddressesByCityNameAndStateProvinceId
EXECUTE AdventureWorks.Person.GetAddresses 
    @PartialCity = 'Birm', 
    @StateProvinceId = 3
GO

--Get all addresses in Alabama...
--Replaces: GetAddressesByStateProvinceId
EXECUTE AdventureWorks.Person.GetAddresses 
    @StateProvinceId = 3
GO

--Get the address with AddressId = 49...
--Replaces: GetAddressById
EXECUTE AdventureWorks.Person.GetAddresses 
    @AddressId = 49
GO

 

As you can see, there are numerous usages under this approach (certainly not limited to what I’ve shown above). The best part about this is that you’ve significantly reduced the amount of similar “Get” procedures you need to write! Any time you need to add additional JOINS or WHERE clauses, you can add them in a single procedure rather than replacing logic that’s scattered all over your codebase.

 

Now you and your DBA can call it a day… He/she owes you a beer after this one. Or if they helped you to arrive at this approach like mine did – you owe them one.