{"id":3056,"date":"2019-02-10T20:16:10","date_gmt":"2019-02-10T07:16:10","guid":{"rendered":"https:\/\/www.helenanderson.co.nz\/?p=3056"},"modified":"2020-08-24T22:07:07","modified_gmt":"2020-08-24T10:07:07","slug":"sql-concepts-from-a-to-z","status":"publish","type":"post","link":"https:\/\/helenanderson.co.nz\/sql-concepts-from-a-to-z\/","title":{"rendered":"SQL concepts from A to Z"},"content":{"rendered":"\n

It’s time for some jargon-busting.<\/p>\n\n\n\n

A big part of my role is to onboard and support junior data analysts. A lot of them have just started<\/a> using SQL, have come from the world of Excel<\/a> analysis and have self-taught the SQL basics.<\/p>\n\n\n\n

Here are some of those terms and concepts that pop up during training if you need a refresher or have a junior analyst <\/a>in your life that needs something to refer back to.<\/p>\n\n\n\n


\n\n\n\n

Alias<\/a>
Begin Transaction<\/a>
CTEs v Subqueries<\/a>
Design<\/a>
ETL<\/a>
Function<\/a>
Group By<\/a>
Heaped Storage<\/a>
Integrity<\/a>
Join<\/a>
Key<\/a>
Lock<\/a>
Massive Parallel Processing<\/a>
Normalisation<\/a>
OLTP v OLAP<\/a>
Privileges<\/a>
Query Plan<\/a>
Disaster Recovery<\/a>
System Tables<\/a>
Truncate v Drop<\/a>
Union<\/a>
View<\/a>
Window Function<\/a>
XML<\/a>
Year<\/a>
Zero<\/a><\/p><\/blockquote>\n\n\n\n


\n\n\n\n

Alias<\/h2>\n\n\n\n

When joining<\/a> tables, we need to state which column<\/a> from which table we want to match up, and which columns we want to return in the results. If there are columns with the same name we need to be specific about which column we want to return.<\/p>\n\n\n\n

select\n  orders.item, \n  inventory.item,\n  inventory.unitprice\nfrom \n  orders \ninner join \n  inventory \non orders.order_item = inventory.inventory_item\n<\/code><\/pre>\n\n\n\n

To make it quicker to type we can alias the two tables with something shorter.<\/p>\n\n\n\n

select\n  o.item, \n  i.item,\n  i.unitprice\nfrom \n  orders o\ninner join \n  inventory i \non o.order_item = i.inventory_item\n<\/code><\/pre>\n\n\n\n

Instead of having to type out the whole table name each time we want a new column added, we can alias them with the letter ‘o’ for orders and ‘i’ for inventory.<\/p>\n\n\n\n

Read more about JOINs and aliasing in this beginner-friendly post<\/a>.<\/p>\n\n\n\n


\n\n\n\n

Begin Transaction<\/h2>\n\n\n\n

SQL Transactions<\/a> are used to trap errors when making changes to tables. During an UPDATE or DELETE statement, the change is auto-committed.<\/p>\n\n\n\n

By wrapping the statement in a transaction we have the opportunity to ‘roll back’ or ‘commit’ when we are sure that it should be executed, or if a condition has been met.<\/p>\n\n\n\n

The following transaction will run in a block and commit if successful.<\/p>\n\n\n\n

begin transaction\n\nupdate orders\nset status = 'sent'\nwhere order_id = '12345'\n\nupdate orders\nset status = 'sent'\nwhere order_id = '54321'\n\ncommit transaction\n<\/code><\/pre>\n\n\n\n

Read more<\/a> about how these work<\/p>\n\n\n\n


\n\n\n\n

CTEs v Subqueries<\/h2>\n\n\n\n

CTEs (Common Table Expressions)<\/a> are a temporary, named result set we can come back to in the scope of a single SELECT, INSERT, UPDATE, DELETE or MERGE statement.<\/p>\n\n\n\n

I use them when dealing with large tables to. For example: get all the columns I need from the ’emailsent’ table, then get all the columns I need from the ’emailunsubscribe’ table. Then in a final step join them together.<\/p>\n\n\n\n

; -- start the CTE with a semicolon to terminate anything above\n\nwith sent as -- here is where you name the dataset\n\n(select \n  emailaddress,\n  emailid, \n  senddate\nfrom\n  marketing.emailsent\nwhere\n  senddate between '2018-01-01' and '2018-01-31'\n),  -- add a comma if you need to add a subsequent CTE\n\nunsubs as\n\n(select \n  emailaddress,\n  emailid, \n  senddate\nfrom\n  marketing.emailunsubscribe\nwhere\n  senddate between '2018-01-01' and '2018-01-31'\n) -- no comma for the last CTE\n\nselect\n  'January' as [monthdelivered],\n  c.country, \n  count(distinct sent.emailaddress) as [countofdelivered], \n  count(distinct unsubs.emailaddress) as [countofunsubd]\nfrom sent\nleft join \n  marketing.customers c on sent.email = unsubs.emailaddress\nleft join  \n  unsubs on sent.email = unsubs.email \n  and sent.emailid = unsubs.emailid\n<\/code><\/pre>\n\n\n\n

Chidiebere<\/a> has written an excellent series<\/a> on CTEs and how they compare with the subquery.<\/p>\n\n\n\n


\n\n\n\n

Design<\/h2>\n\n\n\n

Datamarts tables<\/a> are organised in one of two forms. A \u2018Star\u2019 schema and a \u2018Snowflake\u2019 schema made of two types of tables.<\/p>\n\n\n\n