WHERE Clause Functions: A Word of Caution…

04.23.2012 22:32 by kbeckman | Comments

A little more than a year ago, I wrote a post explaining a way of refactoring stored procedures to reduce the number of similar procedures that differed only by the logic included in the WHERE clause…

 

Here’s my exact quote:

“Your codebase has a glut of repeated stored procedure logic for “get” or SELECT-based operations. Table JOINS, stored procedure parameters (for row filtering) and return types are similar among a focused set of stored procedures. The focused stored procedure group resembles the following pattern: GetEntityById, GetEntitiesByParentId, GetEntityByExactName, GetEntitiesByPartialName, GetEntityBySomeOtherParameterCombo.

Use the T-SQL COALESCE function in the procedure’s WHERE clause logic to implement optional parameters and combine the many similar, individual procedures into a single procedure with functionality for multiple input parameter combinations that yield a single return type.”

 

If implemented properly, the approach I detailed could significantly reduce the number of similar SELECT stored procedures in the database making your life easier as a developer having fewer procedures to maintain. I recently discussed this approach with our Head DBA and he had some reservations claiming that function usage in WHERE clauses could lead to performance issues. With a little more insight, I wanted to take an opportunity to evaluate my original procedure example and refactor it if I could find a better way.

 

Optimization Guidelines for WHERE Clauses

After an hour or so of research, AdventureWorks testing and several articles later (see Resources section at the end of this post), I’ve come up with a few guidelines to help squeeze the best performance out of your procedure execution – at least as far as your WHERE clause is concerned.

  • Know How to Read Your Execution Plan – There’s probably no better tool for performance feedback and optimization than SQL Server’s query execution plan viewer. This will present a visual representation of SQL Server’s query execution operations along with valuable metrics such as execution time, CPU cost, I/O cost, number of rows processed, etc. Seek out and destroy/minimize all Table or Index Scans!
  • Make Your WHERE Clause Columns Immediately Visible to SQL’s Query Optimizer – In other words, don’t bury your filter columns inside a function. Instead call-out the column followed by whatever function you need to complete the expression enabling the Query Optimizer to identify and use any indexes related to the column. Here’s a couple of examples: 
    • DO – WHERE Person.LastName LIKE ‘Beck%’
    • DON’T – WHERE SUBSTRING(Person.LastName, 5) = ‘Beck’
    • DO – WHERE Person.DOB > DATEADD(yyyy, –21, GETDATE())
    • DON’T – WHERE DATEDIFF(yyyy, Person.DOB, GETDATE()) > 21
  • Avoid Nested AND/OR Logic – There may be a creative way to eliminate the use of simple OR chains in your WHERE clauses. If there is, you should use it. While this isn’t a hard-and-fast rule, it is what I identified in my example procedure as something that could be optimized. By rethinking the expression, I was able to reduce the query cost of an index scan on Address.City. Aside from a performance increase, WHERE clause logic will be easier to read if your expressions are simpler. You should treat nested WHERE clause logic in your T-SQL the same way you should treat nested IF statements in your application code – eliminate it with extreme prejudice.

 

Let’s Review the Original Example

Below is the original stored procedure example in its entirety. According to the three guidelines above, the procedure has already called-out the filter column in the WHERE clause making it immediately visible to the Query Optimizer; I’ve evaluated the execution plan (actual output is below); and I’ve identified an ugly nested OR that can be cleaned up…

-- ==========================================================================================
-- Author:      Keith Beckman
-- Create date: 1/7/2011
-- Description: Selects complete address data based on a variety of input criteria.
-- ==========================================================================================
CREATE PROCEDURE [Person].[GetAddresses]
    @AddressId            INT = NULL
    , @PartialCity        NVARCHAR(30) = NULL
    , @StateProvinceId    INT = NULL
AS
BEGIN
 
    SET NOCOUNT ON;
 
    SELECT TOP 20    
        A.AddressID
        , A.AddressLine1
        , A.City
        , A.PostalCode
        , SP.Name
        , CR.Name
    FROM    Person.Address A
            INNER JOIN Person.StateProvince SP ON SP.StateProvinceID = A.StateProvinceID
            INNER JOIN Person.CountryRegion CR ON CR.CountryRegionCode = SP.CountryRegionCode
    WHERE    A.AddressID        = COALESCE(@AddressId, A.AddressID)
        AND A.StateProvinceID   = COALESCE(@StateProvinceId, A.StateProvinceID)
        AND((@PartialCity IS NULL AND A.City = A.City) OR A.City LIKE '%' + @PartialCity + '%')
END

GO

 

Below is the SQL Server actual query plan results for the following procedure call…

--Get all addresses with a city name like "Birm"...
--Replaces: GetAddressesByCityName
EXECUTE AdventureWorks.Person.GetAddresses 
  @PartialCity = 'Birm'
GO

 

image  image

 

One thing to note, using the AdventureWorks database is a contrived example and due to the database size any optimizations will be unnoticeable to the naked eye. However, when you’re working with extremely large tables in a high traffic DB, minor enhancements like the following can make a big difference. Notice the Index Scan above. At 65% of the entire query cost, this could be a big bottleneck under the right conditions.

 

Let’s Tweak the WHERE Clause

WHERE    A.AddressID       = COALESCE(@AddressId, A.AddressID)
   AND A.StateProvinceID   = COALESCE(@StateProvinceId, A.StateProvinceID)
   AND A.City              LIKE '%' + COALESCE(@PartialCity, A.City) + '%'

 

As I mentioned earlier, all we really needed to do for this query is optimize the WHERE clause filter for the Address.City column. Instead of the nested OR that evaluates whether or not the @PartialCity parameter IS NULL and then builds a LIKE expression, I made the entire check a LIKE expression that was immediately aware of the target column, Address.City. Executing the same procedure call against the optimized procedure produced the execution plan results below.  As you can see from the Clustered Index Scan statistics (previously the most expensive part of the query as defined by Cost %) the Estimated I/O Cost and Estimated CPU Cost figures were reduced dramatically. I didn’t eliminate the Clustered Index scan completely, but it is a significantly less percentage of the overall execution cost. The NonClustered Index Seek operation increased by a factor of 6, but since this is always a faster operation than a Scan, it’s more efficient for the overall query execution.

image   image

 

In Closing

There’s always room for refactoring in every line of code you write. Sometimes it will be a year or more before your original code becomes a problem. Or it could take that long for you to circle back around to it… Check your ego at the door and question everything. No matter how good you think you are or how good you really are, you’ve written code that can be improved for both performance and readability.

 

Resources

The Cost of Function Use in a WHERE Clause

Avoid SQL Server Functions in the Where Clause for Performance

Execution Plan Basics

DateTime DSL: Promoting Testability by Stubbing Static Dependencies

12.16.2011 08:35 by kbeckman | Comments

C4SC DSL Series Catalog

GitHub Repo

 

In the last post, I closed by describing the timing problem when the DSL’s unit tests evaluated expected values calculated DateTime.Now. When the unit tests regained the execution focus back from the DSL after the DateTime calculation, the value of DateTime.Now had incremented slightly by a few milliseconds throwing off the expected results. We’ll be taking a hand-rolled approach to solving the issue here…

 

Overview

Your static class has a static dependency that is either very difficult or impossible to test around. The dependency might be a base type in the .NET Framework or it could be a static type that you may or may not own.

 

Instead of calling the static dependency directly, use the Adapter Pattern to wrap the static dependency call in an instance method of a class implementing a common public interface. Provide an injection point in your static class where you can provide a stub if necessary for use in unit testing.

 

Motivation

I arrived a hand-rolled solution for resolving the static DateTime.Now dependency because it seemed to fit the situation the best. I could have solved this a couple of other ways, but I’ll explain why I think the hand-rolled approach is better in this case. This issue could have easily been solved by using a mocking or isolation framework that provides static mocking or .NET Framework base type mocking. My favorite tool for that job is Telerik’s JustMock, but like I said in the last post, the free version doesn’t support static mocking and I don’t want to add a $299 dependency for the professional version to my open source DSL.

 

Mechanics (Examples Using Actual DSL Implementation)

1) Create a public interface that defines the functionality provided by the static dependency.

/// <summary>
/// Defines a service for providing the DateTime.Now value. This can be used in testing scenarios to provide a 
/// testable DateTime.Now value configurable from unit test setup methods.
/// </summary>
public interface IDateTimeNowAdapter
{
    DateTime DateTimeNow();
}

 

2) Implement the public interface and delegate all methods defined in the interface to the static dependency.

/// <summary>
/// IDateTimeNowAdapter implementation that provides the system's actual DateTime.Now value.
/// </summary>
public class SystemDateTimeNowAdapter : IDateTimeNowAdapter
{
    public DateTime DateTimeNow()
    {
        return DateTime.Now;
    }
}

 

3) Provide an alternate implementation to the public interface that allows for testability. This particular implementation takes any instance of DateTime as a constructor parameter argument. When this Type is instantiated in the unit tests, the call to the DateTimeNow() method will always provide the DateTime instance provided at instantiation and allow the test methods access to the exact DateTime value used for the DateTimeNow() calls within the test target class.

/// <summary>
/// IDateTimeNowAdapter implementation that provides configurable DateTime.Now value for use in unit testing.
/// </summary>
public class TestDateTimeNowAdapter : IDateTimeNowAdapter
{
    private readonly DateTime _dateTimeNowValue;

    public TestDateTimeNowAdapter(DateTime dateTimeNowValue)
    {
        _dateTimeNowValue = dateTimeNowValue;
    }

    public DateTime DateTimeNow()
    {
        return _dateTimeNowValue;
    }
}

 

4) Modify the test target class to use an instance of the Type constructed in Step 2. Be sure to use the public interface definition rather than a hard reference to the instance Type. Delegate all calls initially made to the static dependency to the public interface of the instance Type. This will break the immediate dependency your test target class has on the static dependency allowing you to provide an alternate implementation when needed for the tests.

Before…

/// <summary>
/// Calculates the result of DateTimeComponents added to DateTime.Now().
/// </summary>
/// <param name="components"><see cref="DateTimeComponents"/> operand.</param>
/// <returns>Future DateTime calculated from DateTime.Now().</returns>
public static DateTime FromNow(this DateTimeComponents components)
{
    return components.From(DateTime.Now);
}

/// <summary>
/// Calculates the result of DateTimeComponents subtracted from DateTime.Now().
/// </summary>
/// <param name="components"><see cref="DateTimeComponents"/> operand.</param>
/// <returns>Past DateTime calculated from DateTime.Now().</returns>
public static DateTime Ago(this DateTimeComponents components)
{
    return components.AgoFrom(DateTime.Now);
}

After…

/// <summary>
/// Extension methods to calculate DateTime future or past result from DateTimeComponents operand.
/// </summary>
public static class DateTimeComponentsConversionToDateTime
{
    private static IDateTimeNowAdapter _nowAdapter = new SystemDateTimeNowAdapter();

    //Additional class implementation details omitted for this example...

    /// <summary>
    /// Calculates the result of DateTimeComponents added to DateTime.Now().
    /// </summary>
    /// <param name="components"><see cref="DateTimeComponents"/> operand.</param>
    /// <returns>Future DateTime calculated from DateTime.Now().</returns>
    public static DateTime FromNow(this DateTimeComponents components)
    {
        return components.From(_nowAdapter.DateTimeNow());
    }

    /// <summary>
    /// Calculates the result of DateTimeComponents subtracted from DateTime.Now().
    /// </summary>
    /// <param name="components"><see cref="DateTimeComponents"/> operand.</param>
    /// <returns>Past DateTime calculated from DateTime.Now().</returns>
    public static DateTime Ago(this DateTimeComponents components)
    {
        return components.AgoFrom(_nowAdapter.DateTimeNow());
    }    
}

 

4) Provide an internally scoped injection point in your test target class allowing you to replace the default public interface implementation with the Type defined in Step 3. A couple of things here… I’m providing an internally scoped method in this example because I don’t want anything publicly available that consumers of this DSL might be able to use to alter the expected behavior. This injection point is something I want available only to unit tests (described in Step 5). You’ll notice that I’m using the C# lock keyword here to protect the dependency injection point. Since this is a static class and the IDateTimeNowProvider is a static instance, synchronization must be provided to ensure other threads aren’t accessing the same resource we’re trying to swap out here.

 

A seasoned developer might ask why I haven’t just implemented a IoC container to provide the configurable dependency injection here. The short answer is that I felt it would be overkill to implement it for an adapter pattern that simply wraps a call to DateTime.Now. An IoC container would provide a lot of overhead for very little impact here. Additionally, I believe that when you are building frameworks or reusable components, there is no reason to introduce additional dependencies like an IoC container. I would prefer that someone could drop in this DSL in as-is without having to include any other DLL dependencies. IoC containers are better left for actual application code with more involved dependencies.

private static readonly object _syncLock = new object();
        
/// <summary>
/// Sets the DSL's IDateTimeNowAdapter. This is the injection point for tests to provide their own adapter
/// implementation.
/// </summary>
/// <param name="adapter"><see cref="IDateTimeNowAdapter"/> implementation.</param>
internal static void SetDateTimeNowAdapter(IDateTimeNowAdapter adapter)
{
    lock (_syncLock) { _nowAdapter = adapter; }
}
 

5) Set the InternalsVisibleTo attribute in the assembly containing your test target class to allow your unit test assembly access the internal methods of its Types. This will allow the unit test assembly to access the test target class’s internal methods just as if they were of public scope.

//AssemblyInfo.cs for the assembly containing the test target class..

[assembly: InternalsVisibleTo("Test.C4SC")]

 

6) Use the testable adapter class in your unit tests to replace the default adapter implementation. Notice here that the test class has a private instance IDateTimeNowAdapter field. This allows for DRY tests where the test target’s adapter can be set using NUnit’s SetUp and TearDown methods rather than in each individual test.

private IDateTimeNowAdapter _nowAdapter;

/// <summary>
/// Executes individual Test-level setup and initialization prior to each test being run.
/// </summary>
[SetUp]
public void SetUpTest()
{
    //Replace the test target's default IDateTimeNowAdapter...
    nowAdapter = new TestDateTimeNowAdapter(DateTime.Now);
    DateTimeComponentsConversionToDateTime.SetDateTimeNowAdapter(_nowAdapter);
}


/// <summary>
/// Executes individual Test-level clean-up after each test has been run. This method is guaranteed to run 
/// even if an exception is thrown.
/// </summary>
[TearDown]
public void TearDownTest()
{
    //Reset the test target's IDateTimeNowAdapter to the system adapter...
    DateTimeComponentsConversionToDateTime.SetDateTimeNowAdapter(new SystemDateTimeNowAdapter());
}

[Test]
public void it_should_add_1_month_to_now()
{
    DateTime actual      = 1.Month().FromNow();
    DateTime expected    = _nowAdapter.DateTimeNow().AddMonths(1);

    Assert.That(actual, Is.EqualTo(expected));
}

T-SQL Sproc Optional Parameters Using COALESCE

01.26.2011 22:42 by kbeckman | Comments

Overview

Your codebase has a glut of repeated stored procedure logic for “get” or SELECT-based operations. Table JOINS, stored procedure parameters (for row filtering) and return types are similar among a focused set of stored procedures. The focused stored procedure group resembles the following pattern: GetEntityById, GetEntitiesByParentId, GetEntityByExactName, GetEntitiesByPartialName, GetEntityBySomeOtherParameterCombo.

 

Use the T-SQL COALESCE function in the procedure’s WHERE clause logic to implement optional parameters and combine the many similar, individual procedures into a single procedure with functionality for multiple input parameter combinations that yield a single return type.

 

Motivation

I’m currently working in a codebase with an overwhelming amount of T-SQL stored procedures. There’s an individual stored procedure for every CRUD operation you can think of… Unfortunately, with this particular application, it doesn’t make a lot of sense to start swapping out stored procedures and ADO.NET code for a more sexy ORM. The existing data access patterns already work well and all of the developers are very familiar with it. In addition, there are areas of the codebase where the domain model is coupled a bit too tightly to the data access code. The refactoring effort to introduce an ORM would be more than the deadlines allow for. An alternative is to reduce the number of similar “get” stored procedures into a single procedure that provides all of your required functionality. This approach can significantly impact the overall procedure count and make both database management and stored procedure enhancement easier .

 

Creating a single stored procedure for all of your “get” operations is possible with WHERE clause usage of the T-SQL COALESCE function. COALESCE takes an expression list and applies the first item that is NOT NULL. The expression list allows for the addition of stored procedure parameters as well as column values. Initializing all of your optional parameters = NULL in the parameter declaration will initialize the parameter as NULL if a value is not given by a call to the procedure. Apply a filter in the WHERE clause using the COALESCE function that uses your optional parameter and provide an alternate expression (if your parameter evaluates to IS NULL) that will always logically evaluate to TRUE. This will make sense when you evaluate the example…

 

Mechanics

1) Create a new stored procedure that will replace all of the existing “get” procedures for a given entity. Name it with a generic name that doesn’t indicate what parameter combination is required – GetEntity.

2) Add all of the possible input parameters to the new procedure. All required parameters should be declared as you normally would declare them. All optional parameters should be initialized as NULL in the declaration.

 

CREATE PROCEDURE dbo.GetEntity
    @EntityId            INT = NULL
    , @PartialEntityName NVARCHAR(30)
    , @ParentEntityId    INT = NULL
AS
BEGIN
...

3) For all optional filtering parameters, use the T-SQL COALESCE keyword to build an expression in the WHERE clause that first uses your optional parameter if it is NOT NULL and some other filter value(s) if your optional parameter IS NULL.

 

WHERE SomeTable.EntityId            = COALESCE(@EntityId, SomeTable.EntityId)
    AND SomeTable.ParentEntityId    = COALESCE(@ParentEntityId, SomeTable.ParentEntityId)
    AND((@PartialEntityName IS NULL AND SomeTable.EntityName = SomeTable.EntityName) 
        OR SomeTable.EntityName LIKE '%' + @PartialEntityName + '%')

 

4) Replace the usage of an existing “get” procedure with your new stored procedure and test its behavior.

5) Make sure any related unit tests are passing.

6) Repeat steps 4 and 5 for any usages of your old “get” stored procedures.

7) Be sure to tune/update any existing indexes or create additional indexes to accommodate your search criteria.

 

Example

This example uses the AdventureWorks sample DB in SQL 2008.

 

SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- ==========================================================================================
-- Author:      Keith Beckman
-- Create date: 1/7/2011
-- Description: Selects complete address data based on a variety of input criteria.
-- ==========================================================================================
CREATE PROCEDURE Person.GetAddresses
    @AddressId            INT = NULL
    , @PartialCity        NVARCHAR(30)
    , @StateProvinceId    INT = NULL
AS
BEGIN

    SET NOCOUNT ON;

        SELECT TOP 20    
        A.AddressID
        , A.AddressLine1
        , A.City
        , A.PostalCode
        , SP.Name
        , CR.Name
    FROM    Person.Address A
            INNER JOIN Person.StateProvince SP ON SP.StateProvinceID = A.StateProvinceID
            INNER JOIN Person.CountryRegion CR ON CR.CountryRegionCode = SP.CountryRegionCode
    WHERE    A.AddressID             = COALESCE(@AddressId, A.AddressID)
            AND A.StateProvinceID    = COALESCE(@StateProvinceId, A.StateProvinceID)
            AND((@PartialCity IS NULL AND A.City = A.City) OR A.City LIKE '%' + @PartialCity + '%')
END
GO

 

Here’s some usage examples to get an idea of what’s possible with the dynamic input parameter stored procedure approach…

 

--Get all addresses with a city name like "Birm"...
--Replaces: GetAddressesByCityName
EXECUTE AdventureWorks.Person.GetAddresses 
  @PartialCity = 'Birm'
GO

--Get all addresses with a city name like "Birm" that are in Alabama...
--Replaces: GetAddressesByCityNameAndStateProvinceId
EXECUTE AdventureWorks.Person.GetAddresses 
    @PartialCity = 'Birm', 
    @StateProvinceId = 3
GO

--Get all addresses in Alabama...
--Replaces: GetAddressesByStateProvinceId
EXECUTE AdventureWorks.Person.GetAddresses 
    @StateProvinceId = 3
GO

--Get the address with AddressId = 49...
--Replaces: GetAddressById
EXECUTE AdventureWorks.Person.GetAddresses 
    @AddressId = 49
GO

 

As you can see, there are numerous usages under this approach (certainly not limited to what I’ve shown above). The best part about this is that you’ve significantly reduced the amount of similar “Get” procedures you need to write! Any time you need to add additional JOINS or WHERE clauses, you can add them in a single procedure rather than replacing logic that’s scattered all over your codebase.

 

Now you and your DBA can call it a day… He/she owes you a beer after this one. Or if they helped you to arrive at this approach like mine did – you owe them one.