Sunday 30 March 2014

Applied Postconventional Creativity and Deconstructed Stored Procedures

Prerequisite to this article is a great view of creativity, created by Arthur J. Cropley and presented by Jurgen Appelo, available here: http://www.noop.nl/2009/09/the-three-phases-of-creativity.html

As with most perspectives I read about, I look for ways to apply it effortlessly. In this case, such an opportunity was available the very next day. I was having an issue with an SQL stored procedure that was pulling too much data and crashing tempDB. Considering the entire database was 9GB, and I had temporarily inflated tempDB's max to a ludicrous 23GB, there was definitely an opportunity for refactoring. It also bears mention that it would take almost two hours to fail.

I've seen all too many of these sprocs during my career:

SELECT id, name, locationName, accountNumber, etc...
FROM companies
JOIN billingAccounts ON billingAccounts.compId = companies.Id
JOIN locations ON locations.compId = companies.Id
JOIN billingPeriods ON billingPeriods.Id = @selectedPeriodId
JOIN billingInvoices ON billingInvoices.PeriodId = billingPeriods.Id
JOIN billingItems ON billingItems.InvoiceId = billingInvoices.Id
WHERE
billingItems.Valid = 1 AND
locations.StillInOperation = 'Yeap' AND
billingInvoices.SignedOffBySomeone = 'CheckWithDave'

This style of sproc writing is hard to read, confusing, not maintainable, and doesn't easily explain the domain or model. SQL comes across as very un-structured in this form. There are some options for diagnosis right off the bat. First of all, you can get SQL Management studio to return the execution plan and look through it for table scans and opportunities to add clustered indexes. These are great options to help you figure out what's going on, but they're not my subject, so I'm going to blatantly ignore them and move on!

My colleague told me an anecdote about a way he had seen complex sprocs rewritten on a past project. The individual doing the rewriting wasn't incredibly familiar with optimal join patterns, and instead decided to break down massive data pulls into step-by-step temp table update statements. The pattern was so non-standard that I immediately associated it with pre conventional creativity. In my experience, there is a widespread common understanding that big data selects mean big swampy JOINs and stacks of WHERE constraints, but it seems that one man didn't have that wall of preconception holding them back.

This has everything I love in a good work story: a guy who knows a guy, applications of cognitive theory, and a new way of using a tool that's already in my tool belt.

The pattern goes like this:
1. Your first select is your 'one-per-row' select. If you're looking for billing items, don't start with the companies table. Start with your subject.
2. Select your 'one-per-row' results into a #temp table. Grab any foreign keys that you'll need for joins. You may join one and only one other table in this initial select.
3. All other columns that you'll want from other tables are casted nulls. These will be filled in later by subsequent update statements that, in turn, join one table at a time and 'fill in the blanks'.
4. Make use of WHERE NOT foo IS NULL when appropriate to keep from blanket-updating your temp table.

My made-up report sproc from earlier would look like the following when refactored:

--Initial Select
SELECT
-- FKs
items.Id AS ItemId,
invoices.InvoiceId AS InvoiceId,
invoices.LocationId,
CAST(NULL AS BIGINT) AS CompanyId,
--Output fields
invoices.InvoiceNumber AS InvoiceNumber,
items.Quantity,
items.TotalPrice,
invoices.InvoiceDate,
CAST(NULL AS NVARCHAR(100)) AS BillingAccountType,
CAST(NULL AS NVARCHAR(100)) AS CompanyName,
CAST(NULL AS NVARCHAR(100)) AS LocationName,
CAST(NULL AS NVARCHAR(150)) AS LocationAddress,
etc...
INTO #BillingItemReport
FROM
BillingItems items
LEFT JOIN
BillingInvoices invoices ON items.InvoiceId = invoices.Id
WHERE
invoices.InvoiceDate = @invoiceDate

UPDATE
report
SET
CompanyId = locs.CompanyId,
LocationName = locs.LocationName,
LocationAddress = locs.BillingAddress,
FROM #BillingItemReport report
LEFT JOIN Locations locs ON report.LocationId = locs.Id

UPDATE
report
SET
LocationName = translations.LocationName
FROM #BillingItemReport report
LEFT JOIN LocationTranslations translations ON translations.LocationId = report.LocationId
WHERE NOT translations.LocationId IS NULL -- In case the translation doesn't exist

Etcetera, until you've filled in all of the casted null columns and have your whole report.

Have you ever gone out to dine at a cloth-napkin-type establishment and were offered a 'deconstructed' salad? It's quite possible to see the ingredients of the salad by looking at a tossed version, but you may have to look under a few leaves, and you probably won't be able to figure out exactly how much of each component was used. A deconstructed salad lays it all out on the plate so you can see exactly how much lettuce and dressing you're getting for sixteen dollars. This is also one of the many instances in life that you'll end up paying someone to not perform a service.

Deconstructed Stored Procedures have some of the same outcomes. The model is clear as day. The JOIN order and progression is dead simple because we're only allowed to lace in one table at a time. The casted nulls remind us of our final select and prevent mid-select CASE statements that can further muddy up the works. Last and most certainly not least, it's extremely fast because we're only ever pulling a tight stream of explicitly defined data from the base.

I hope that you've learned something. Thanks for reading.

Adam Krieger

No comments:

Post a Comment