Last week I was explaining to a junior analyst how SQL CTEs (common table expressions) work.
They’re persisted temporary data sets, that allow you to store a single query to go back to later in your script.
They’re underrated compared to the subquery, that seems to be what most analysts around me use. Here’s why I prefer to use CTEs when building SQL queries.
How do they work?
In the example below I’ve used a SQL CTE to:
- grab all the data from the Email Delivered table, told SQL to hold that in memory,
- then grabbed what I need from the Unsubscribe table, told SQL to hold that in memory too,
- then joined them together with flags from the Customers table
Where you see a WITH, is the CTE starting and then I’m naming them ‘delivered’ and ‘unsubs’ before starting to tell the CTE what I want to return:
; WITH delivered as
-- start the CTE with a semicolon to
terminate anything above
senddate between '2018-01-01' and '2018-01-31' ),
-- add a comma if you need to add a subsequent CTE
senddate between '2018-01-01' and '2018-01-31' )
-- no comma for the last CTE
'January' as monthdelivered,
count(distinct delivered.emailaddress) as [countofdelivered],
count(distinct unsubs.emailaddress) as [countofunsubd]
on delivered.email = unsubs.emailaddress
unsubs on delivered.email = unsubs.email
and delivered.emailid = unsubs.emailid
When I do the final ‘join everything together’ part I’m joining fields from the ‘delivered’ dataset such as ‘delivered.email’.
Why not just use a Subquery?
Here is an example of a Subquery. I don’t use them often because my brain doesn’t work that way. I would rather get all my datasets separately then join them all together.
The way I get my head around reading it is thinking about it from the inside out. It’s nesting everything you need together, but in my opinion, it tends to get ugly really quickly.
- The first step is to run the query in the centre starting ‘SELECT AccountID …’ to get all orders greater than 30 from the OrderHistory table.
- Then JOIN on the Account table to look up which Accounts were from New Zealand.
- Then the top SELECT runs to return all the fields from the ord dataset and the three columns I want to see from the Account table.
sales.orderhistory where ordervalue > '30'
on ord.accountid = acc.accountid
acc.country = 'New Zealand'
Why are they so great?
You can use them multiple times throughout your script and they are readable, you can return what you need then reference it later.
Why not just create a table?
Are there any negatives?
CTEs don’t last forever and can only be used in the query you’re currently in, unlike temp tables or views that can survive outside the current script.
What about performance?
SQL server will always decide for you, via the query planner, the best way to execute your query.
If you ask your friendly DBA which strategy to use, they will tell you ‘it depends’ because it does.
The CTE is all about readability, so if it works for you give it a try.
Photo by Pixabay on Pexels