Last summer I wrote about NOT EXISTS and how I could speed up the query by adding a left join and adding an IS NULL to the WHERE clause. Last week, I had a similar situation, but with a NOT IN. It was killing my performance. I wanted to remove all my test emails as I prepared my production email list and was calling them out specifically.
SELECT *
FROM dbo.SuperHeroes s
INNER JOIN dbo.Contact c ON s.ContactId = c.ContactId
AND c.EmailAddress NOT IN ('Batman@TestEmail.com', 'Superman@TestEmail.com', 'WonderWoman@TestEmail.com', 'Aquaman@TestEmail.com')
So I tried my super cool trick:
CREATE TABLE #SuperHeroTestEmail (EmailAddress varchar(50));
INSERT INTO #SuperHeroTestEmail (EmailAddress) VALUES ('Batman@TestEmail.com');
INSERT INTO #SuperHeroTestEmail (EmailAddress) VALUES ('Superman@TestEmail.com');
INSERT INTO #SuperHeroTestEmail (EmailAddress) VALUES ('WonderWoman@TestEmail.com');
INSERT INTO #SuperHeroTestEmail (EmailAddress) VALUES ('Aquaman@TestEmail.com');
SELECT *
FROM dbo.SuperHeroes s
INNER JOIN dbo.Contact c ON s.ContactId = c.ContactId
LEFT OUTER JOIN #SuperHeroTestEmail em ON c.EmailAddress = em.EmailAddress
WHERE em.EmailAddress IS NULL;
Here I create my temp table so I can load in the email addresses that I want to remove. Then I do a LEFT JOIN on those email addresses and in my where clause I force that join to only bring me back the records that show NULL for the temp email table. This way, I am telling SQL exactly what I want it to bring back instead of telling it what I don’t want. This makes my query run faster.
I was so excited, but I started to test and noticed my counts were way off. I couldn’t figure out why until I realized my NOT IN was removing all my NULL email records. I don’t have email addresses for all of my super heroes! As soon as I figured out that, I knew what I had to do.
CREATE TABLE #SuperHeroTestEmail (EmailAddress varchar(50));
INSERT INTO #SuperHeroTestEmail (EmailAddress) VALUES ('Batman@TestEmail.com');
INSERT INTO #SuperHeroTestEmail (EmailAddress) VALUES ('Superman@TestEmail.com');
INSERT INTO #SuperHeroTestEmail (EmailAddress) VALUES ('WonderWoman@TestEmail.com');
INSERT INTO #SuperHeroTestEmail (EmailAddress) VALUES ('Aquaman@TestEmail.com');
SELECT *
FROM dbo.SuperHeroes s
INNER JOIN dbo.Contact c ON s.ContactId = c.ContactId
LEFT OUTER JOIN #SuperHeroTestEmail em ON c.EmailAddress = em.EmailAddress
LEFT OUTER JOIN dbo.Contact c2 ON c2.ContactId = c.ContactID AND c2.EmailAddress IS NULL
WHERE em.EmailAddress IS NULL
AND c2.ContactId IS NULL;
Here I am using the primary key in contact to join back to it but also am telling it in the join to only bring me records where the email is NULL. Then, I tell it in my where clause to only look at the records that where C2.ContactID IS NULL, this gets me around using an “IS NOT NULL” which is much slower in this case.
At the end, it cut my query run time in half, which was a win for me. The main idea is that you want to tell SQL Server what you want, not what you don’t want. It helps the engine be more precise and know exactly what you bring you. I compare this to going to a restaurant and being able to tell the waiter what dish I want instead of telling the waiter that I, “Don’t want anything with fish”. Then the waiter has to check every dish and ask me if that is what I want. It takes longer. Instead, I just tell the waiter and SQL Server what I want to consume.
The song today is a mashup of Taylor Swift (We are Never Getting Back Together) and Korn (Coming Undone) and has been living rent free in my head for weeks. Huge thanks to my awesome co-worker, Bree, that showed it to me.