I have been tuning queries and one of the first things I noticed was Sub-queries in the SELECT Clause.
SELECT (SELECT H1.FormalName
FROM HideOut H1
WHERE TypeDesc = 'Villain'
AND H1.Storyline = S.Storyline) AS VillainHideOut
, (SELECT H2.FormalName
FROM HideOut H2
WHERE TypeDesc = 'Hero'
AND H2.Storyline = S.Storyline) AS HeroHideOut
, S.HeroName AS Hero
, V.VillainName AS Villain
FROM dbo.SuperHero S
INNER JOIN [dbo].[Villain] V
ON S.HeroLink = V.HeroLink
WHERE V.IsActive = 1 AND S.IsActive = 1
Why is this query slow? If this query were to return 50 rows, it would run each query in the SELECT clause 50 times, and since there are two of them, that is 100 query runs. What if I returned 100,000 rows? That would be 200,000 query runs. How could I do this differently?
SELECT H1.FormalName AS VillainHideOut
, H2.FormalName AS HeroHideOut
, S.HeroName AS Hero
, V.VillainName AS Villain
FROM dbo.SuperHero S
INNER JOIN [dbo].[Villain] V
ON S.HeroLink = V.HeroLink
LEFT JOIN HideOut H1
ON H1.Storyline = S.Storyline AND H1.TypeDesc = 'Villain'
LEFT JOIN HideOut H2
ON H2.Storyline = S.Storyline AND H2.TypeDesc = 'Hero'
WHERE V.IsActive = 1 AND S.IsActive = 1
I moved the two correlated sub-queries into the JOIN clause and then simplified them to just join to the tables. This means I will only select from each table once, instead of for each row and will drop my query runs significantly. I could have left them as correlated sub-queries in the JOIN clause and it still would have performed better than having them in the SELECT clause.
In the query I was working with, I checked the statistics IO output and saw my logical reads on the HeroHideOut table drop from 24,103,280 logical reads down to 10,064 logical reads by making this one change.
Happy Tuning!
The song for this post is OneRepublic’s Run.
[…] Andrea Allred notes a potential performance issue: […]