The query was perfectly broken…

I have been teaching a T-SQL 101 class and for the homework, we asked the students to get all the records where our heroes had a birthdate between 1995 through 1999. I expected something like this:

SELECT FirstName, LastName, Birthdate
FROM Heroes
WHERE Birthdate BETWEEN '1/1/1995' AND '12/31/1999'

OR

SELECT FirstName, LastName, Birthdate
FROM Heroes
WHERE Birthdate >= '1/1/1995' AND Birthdate <= '12/31/1999'


Imagine my surprise when one of the students turned in this:

SELECT FirstName, LastName, Birthdate
FROM Heroes
WHERE Birthdate BETWEEN '1995' AND '1999'

When I first saw the query I thought, “There is no way they ran that and it worked.” So I wrote it up and ran it on my data. Guess what? IT RUNS AND RETURNS DATA! I was shocked.

I started looking at the plan and what it did to the data and found that it had done an implicit conversion on the dates and assumed 1/1/1995 to 1/1/1999 based on the year. So we were missing data from the results, but I was still in shock that it had run in the first place and shared this information with my co-worker who reminded me that dates get crazy and if I only put in ’12/31/1999′ and there is a time in the field, it will cut off most of the times within that day because it will assume I want ’12/31/1999 00:00:00′. If I want the full day, I need to get in the habit of specifying ’12/31/1999 23:59:59′ or ‘1/1/2000 00:00:00’ and then test my results to make sure I am getting what I truly want back from the database.

The song for this post is BANNERS – Perfectly Broken.

We are never coming undone with NOT IN…

Last summer I wrote about NOT EXISTS and how I could speed up the query by adding a left join and adding an IS NULL to the WHERE clause. Last week, I had a similar situation, but with a NOT IN. It was killing my performance. I wanted to remove all my test emails as I prepared my production email list and was calling them out specifically.

SELECT *
FROM dbo.SuperHeroes s 
INNER JOIN dbo.Contact c ON s.ContactId = c.ContactId
  AND c.EmailAddress NOT IN ('Batman@TestEmail.com', 'Superman@TestEmail.com', 'WonderWoman@TestEmail.com', 'Aquaman@TestEmail.com')

So I tried my super cool trick:

CREATE TABLE #SuperHeroTestEmail (EmailAddress varchar(50));
  INSERT INTO #SuperHeroTestEmail  (EmailAddress) VALUES ('Batman@TestEmail.com');
  INSERT INTO #SuperHeroTestEmail  (EmailAddress) VALUES ('Superman@TestEmail.com');
  INSERT INTO #SuperHeroTestEmail  (EmailAddress) VALUES ('WonderWoman@TestEmail.com');
  INSERT INTO #SuperHeroTestEmail  (EmailAddress) VALUES ('Aquaman@TestEmail.com');

SELECT *
FROM dbo.SuperHeroes s 
INNER JOIN dbo.Contact c ON s.ContactId = c.ContactId
LEFT OUTER JOIN #SuperHeroTestEmail em ON c.EmailAddress = em.EmailAddress
WHERE em.EmailAddress IS NULL;

Here I create my temp table so I can load in the email addresses that I want to remove. Then I do a LEFT JOIN on those email addresses and in my where clause I force that join to only bring me back the records that show NULL for the temp email table. This way, I am telling SQL exactly what I want it to bring back instead of telling it what I don’t want. This makes my query run faster.

I was so excited, but I started to test and noticed my counts were way off. I couldn’t figure out why until I realized my NOT IN was removing all my NULL email records. I don’t have email addresses for all of my super heroes! As soon as I figured out that, I knew what I had to do.

CREATE TABLE #SuperHeroTestEmail (EmailAddress varchar(50));
  INSERT INTO #SuperHeroTestEmail  (EmailAddress) VALUES ('Batman@TestEmail.com');
  INSERT INTO #SuperHeroTestEmail  (EmailAddress) VALUES ('Superman@TestEmail.com');
  INSERT INTO #SuperHeroTestEmail  (EmailAddress) VALUES ('WonderWoman@TestEmail.com');
  INSERT INTO #SuperHeroTestEmail  (EmailAddress) VALUES ('Aquaman@TestEmail.com');

SELECT *
FROM dbo.SuperHeroes s 
INNER JOIN dbo.Contact c ON s.ContactId = c.ContactId
LEFT OUTER JOIN #SuperHeroTestEmail em ON c.EmailAddress = em.EmailAddress
LEFT OUTER JOIN dbo.Contact c2 ON c2.ContactId = c.ContactID AND c2.EmailAddress IS NULL
WHERE em.EmailAddress IS NULL
AND c2.ContactId IS NULL;

Here I am using the primary key in contact to join back to it but also am telling it in the join to only bring me records where the email is NULL. Then, I tell it in my where clause to only look at the records that where C2.ContactID IS NULL, this gets me around using an “IS NOT NULL” which is much slower in this case.

At the end, it cut my query run time in half, which was a win for me. The main idea is that you want to tell SQL Server what you want, not what you don’t want. It helps the engine be more precise and know exactly what you bring you. I compare this to going to a restaurant and being able to tell the waiter what dish I want instead of telling the waiter that I, “Don’t want anything with fish”. Then the waiter has to check every dish and ask me if that is what I want. It takes longer. Instead, I just tell the waiter and SQL Server what I want to consume.

The song today is a mashup of Taylor Swift (We are Never Getting Back Together) and Korn (Coming Undone) and has been living rent free in my head for weeks. Huge thanks to my awesome co-worker, Bree, that showed it to me.

Transactions follow me left and right but who did that over here?

I have been working my way through a fantastic training on SQL Internals and when I saw this trick, I had to write it down so I wouldn’t forget it.

Say you have a user come to you and they dropped a table sometime yesterday, but they don’t remember when and now they need it back. You could start the restore process and roll through logs until you see the drop and then restore to the hour before or you could run this super cool query to get the time the table was dropped.

(Before I ran this, I set up a test database, created a table, filled it with stuff, took a full backup and a transaction log backup, dropped the table and then took another transaction log backup)

SELECT [Current LSN]
		,[Operation]
		,[Context]
		,[Transaction ID]
		,[Description]
		,[Begin Time]
		,[Transaction SID]
FROM fn_dblog (NULL,NULL)
INNER JOIN(SELECT [Transaction ID] AS tid
FROM fn_dblog(NULL,NULL)
WHERE [Transaction Name] LIKE 'DROPOBJ%')fd ON [Transaction ID] = fd.tid

See that Begin Time? We want to roll our logs forward to right before that started. How cool is that?!!! Nearest point in time recovery that is possible all because of reading through the log to see when the drop occurred.

But this next part was the piece that blew my mind. What if I didn’t know who dropped the table, but wanted to talk to them so they didn’t do it again? I can add one more column to my query.

SELECT [Current LSN]
		,[Operation]
		,[Context]
		,[Transaction ID]
		,[Description]
		,[Begin Time]
		,[Transaction SID]
		,SUSER_SNAME ([Transaction SID]) AS WhoDidIt
FROM fn_dblog (NULL,NULL)
INNER JOIN(SELECT [Transaction ID] AS tid
FROM fn_dblog(NULL,NULL)
WHERE [Transaction Name] LIKE 'DROPOBJ%')fd ON [Transaction ID] = fd.tid

I am passing that Transaction SID into the SUSER_SNAME built in function.

Probably shouldn’t be surprised by that answer.

The song for this post is Left and Right by Charlie Puth.

You’re like the calmest failing job, I need you louder with a DPA alert…

I have an RDS instance that when backups are failing, we have no idea. We use DPA as one of our alerting systems, but with RDS, failing agent jobs are harder to find. I took the built in RDS status stored procedure and adjusted it enough to send me an alert through DPA so I can know that there are problems.

There are a couple of things I have already done, I have set up my RDS instance to talk to DPA, I have granted my DPA user access to run what it needs in RDS and I have set up a user in DPA that can email\call me when there is a problem. I used this post about DPA to help me get the basics done and then I did a lot of trial and error on my code to get the alert working just right.

This is the basic status stored proc that is telling me if my backups are working or not:

exec msdb.dbo.rds_task_status @db_name='MyDatabaseName';

I tried a few things and finally realized I needed to drop it into a temp table so that I could filter it down to only get what I need:

CREATE TABLE #tempAlert 
(task_id	bigint
,task_type	varchar(200)
,database_name	varchar(200)
,[% complete] int
,[duration(mins)] int	
,lifecycle varchar(20)
,task_info	varchar(8000)
,last_updated	datetime2
,created_at	datetime2
,S3_object_arn	varchar(2000)
,overwrite_S3_backup_file	int
,KMS_master_key_arn	varchar(200)
,filepath	varchar(200)
,overwrite_file int);

INSERT INTO #tempAlert
exec msdb.dbo.rds_task_status @db_name='MyDatabaseName';

SELECT *
FROM #TempAlert
WHERE task_type = 'BACKUP_DB_DIFFERENTIAL' AND last_updated > GETDATE()-1 AND lifecycle <> 'SUCCESS';

DROP TABLE #TempAlert;

This code is specifically looking for any diff failures in the last day, if I want fulls, I switch task_type = ‘BACKUP_DB_DIFFERENTIAL’ to task_type = ‘BACKUP_DB’.

In DPA, I go to ALERTS >>Manage Alerts and select “Custom” for my Alert Category and “Customer SQL Alert – Multiple Numeric Return” for my Alert Type then select “Create Alert”.

I gave it a name that was descriptive and changed my Execution Interval to once a day. I don’t want it to be firing all the time against my RDS instance and running up my bill. I could run it more often if I wanted. I added some text to my “Notification Text” box that will be helpful to anyone that gets the alert. I selected my Database instances that I wanted this alert to execute against. In the “SQL Statement” I pasted the bit of code above that creates the temp table, runs the stored procedure and inserts the results into that temp table, then filters the results and finally drops the temp table.

Finally, in the Alert Level and Notifications section, I set the “High” row of “Min” to a value of one and the “Max” I left empty, then added my Notification Group. I tested it to make sure all the connections were working properly and then I saved it. “I am the backup, I am the failure and I am the alert”.

The song for this post is Matt Maeson – Mr. Rattlebone

I’m Gonna Spend My Time Speeding that Query Up, Like It’s Never Enough, Like it’s Born to Run…

Have I mentioned that I like query tuning? One of my favorite tuning tricks is removing Sub-queries from WHERE clauses. Let me give an example:

SELECT HeroName
	,HasCape
	,FavoriteColor
	,LairId
FROM [dbo].[SuperHero] s 			
WHERE HeroType = 2
AND NOT EXISTS(SELECT 1 
		FROM [dbo].[SuperHero] x 								
		WHERE x.HeroID = s.HeroID 
			 AND x.IsHuman = 1 AND x.Weakness = 'Lack of Control')

Notice the “NOT EXISTS *Sub-Query* section. Any time I see this pattern or even a “NOT IN *Sub-Query*” pattern, I know I can fix it like this:

SELECT s.HeroName
		, s.HasCape
		, s.FavoriteColor
		, s.LairId
FROM [dbo].[SuperHero] s 
	LEFT JOIN [dbo].[SuperHero] x ON x.HeroID = s.HeroID 
		 AND x.IsHuman = 1
		 AND x.Weakness = 'Lack of Control'	
WHERE HeroType = 2
	AND x.HeroId IS NULL

In this second example, I have moved the sub-query to be in a LEFT JOIN with the same criteria and then in the WHERE I use one of the columns that should be populated (I favor ID columns here) and look to see if it “IS NULL”. That “IS NULL” works the same way as the “NOT EXISTS” and the “NOT IN”.

This allows me to remove the non-sargable arguments from the where clause and takes my query from non-sargable to sargable. (from Wikipedia- The term is derived from a contraction of Search ARGument ABLE. A query failing to be sargable is known as a non-sargable query and typically has a negative effect on query time, so one of the steps in query optimization is to convert them to be sargable).

With simple queries that have a low number of records, I hardly notice a difference in performance. As the queries become more complex or the row numbers increase, the difference begins to show in the query run time and the IO statistics.

The song for this post is I’m Born to Run by American Authors.

I’m Beggin, Beggin you, to stop using VarChars as IDs

As I was troubleshooting a performance issue, I noticed that there was an implicit conversion (SQL Server automatically converts the data from one data type to another) happening in my join. The join was on a column that was named the same in both tables, but one was datatype INT (integer) and the other was a datatype of VARCHAR(50) (variable character up to 50 places).

While the implicit conversion was happening transparently to our coders and users, it was causing performance impacts to the query. I wanted to change the datatype from VARCHAR(50) to an INT, not only to match the other table, but also because INTs are faster to join on than VARCHARs in the SQL engine.

My first step was to make sure there weren’t any values in the column that would have an issue changing to an int. For this task, I am using TRY_CAST to make my life easier.

SELECT TRY_CAST(SuperHeroId as INT) as Result, SuperHeroId
FROM dbo.Lair
WHERE TRY_CAST(SuperHeroId as INT) IS NULL
AND SuperHeroId IS NOT NULL

The TRY_CAST above is checking to see if I can CAST the value as an INT. If it can’t, it will return a NULL value. My WHERE clause will help me quickly identify the values that are failing which will allow me to fix the data before I change the data type on the column.

Once my query doesn’t return any rows, I am ready to change my datatype, which will remove that implicit conversion and increase the performance of any queries using that join.

The song for this post is Beggin’ by Maneskin.

These are the ints that never die, we reseeded negative.

Recently we had a system go down because we ran out of integers. (Mental note to create an alert when we are getting near to running out of integers.) If we upped the column to a bigint we were going to have to drop all the indexes. This server doesn’t have the capacity needed to do an operation of that size and it was estimated that it would be down for 8 hours while we dropped indexes, upped the column type and added back the indexes. This was way too long as it was early in the work day (had it been evening, it would have been fine).

There was also a concern about how many stored procedures were expecting an int but would need to be modified to a bigint, along with any code. That was a big undertaking and we were in an emergency down.

We talked about adding an additional table to take over but again, we were going to need to adjust a lot of things. During our discussion, I was reviewing this awesome blog post by Ed Pollack.

We decided that the fastest temporary solution was to reseed starting with the the smallest of negative ints -2147483648. Our table was already counting up with an increment of 1 and so we picked the smallest of numbers. That means we will seed with -2147483648, then go to -2147483647 and up to 0. We will have to watch closely as we approach 0 to not spill over to the positive numbers that have already been used.

This is only a temporary solution because of that spill over. Our real solution is to do the work and change that column to a bigint.

Here is a sample of what our code looked like:

DBCC CHECKIDENT ('dbo.MyTable', RESEED, -2147483648);

Because we chose this option, there was no need to drop indexes or fix code, but we are in the process of getting all of that ready for a production release.

Within seconds of reseeding, we were back online and working. It was absolutely a day I will remember.

The song for this post is Avicci’s The Nights.

I Made That Slow Query Run, Run, Run

I have been tuning queries and one of the first things I noticed was Sub-queries in the SELECT Clause.

SELECT (SELECT H1.FormalName
			FROM HideOut H1
			WHERE TypeDesc = 'Villain' 
			AND H1.Storyline = S.Storyline) AS VillainHideOut
     , (SELECT H2.FormalName
			FROM HideOut H2
			WHERE TypeDesc = 'Hero' 
			AND H2.Storyline = S.Storyline) AS HeroHideOut
     , S.HeroName AS Hero
     , V.VillainName AS Villain
FROM dbo.SuperHero S
	INNER JOIN [dbo].[Villain] V
		ON S.HeroLink = V.HeroLink
WHERE V.IsActive = 1 AND S.IsActive = 1

Why is this query slow? If this query were to return 50 rows, it would run each query in the SELECT clause 50 times, and since there are two of them, that is 100 query runs. What if I returned 100,000 rows? That would be 200,000 query runs. How could I do this differently?

SELECT H1.FormalName AS VillainHideOut
     , H2.FormalName AS HeroHideOut
     , S.HeroName AS Hero
     , V.VillainName AS Villain
FROM dbo.SuperHero S
	INNER JOIN [dbo].[Villain] V
		ON S.HeroLink = V.HeroLink
	LEFT JOIN HideOut H1 
		ON H1.Storyline = S.Storyline AND H1.TypeDesc = 'Villain'
	LEFT JOIN HideOut H2 
		ON H2.Storyline = S.Storyline AND H2.TypeDesc = 'Hero'
WHERE V.IsActive = 1 AND S.IsActive = 1

I moved the two correlated sub-queries into the JOIN clause and then simplified them to just join to the tables. This means I will only select from each table once, instead of for each row and will drop my query runs significantly. I could have left them as correlated sub-queries in the JOIN clause and it still would have performed better than having them in the SELECT clause.

In the query I was working with, I checked the statistics IO output and saw my logical reads on the HeroHideOut table drop from 24,103,280 logical reads down to 10,064 logical reads by making this one change.

Happy Tuning!

The song for this post is OneRepublic’s Run.

I can query multiple instances, I am King!

In the past, I have talked about CMS (Central Management Servers), but now I don’t have CMS configured and still want to query multiple instances at once. Local Server Groups are my friend.

In SSMS, I start by selecting View>>Registered Servers.

I then right click on “Local Server Groups” and select “New Server Group”.

Next I right click on the group I just created, in this case “Production” and select “New Server Registration”. I then fill in my servername, the type of Authentication, in this case I am using SQL Server Authentication and my login/password. I also am saving my password. This will help in the future. The Registered Server Name can be different. In the real world, my servernames are weird and so the Registered Server Name is the easy to remember name or the nickname I use for the server (all of my servers have nicknames). The description will come up when I hover over the server name once I have it registered.

Then I repeat this process until I have registered all my servers for Production under the Production group.

Now comes the cool part. I right click on my Production Server Group and select “New Query”. Because I saved my password, it connects to all my Production instances in one window. By default, it creates a pink bar at the bottom showing how many instances connected and the name of the Server Group.

Now I can run all my queries at once and the results will have the instance name prepended to each row. Word of warning, I never leave this connection open. I open it when I need it and then close it again so I don’t accidentally run something against all my servers.

The song for this post, King, is by Florence + The Machines.

Hit me with them good vibes, CTfP is set nice. Everything is so fire, little bit of sunshine!

Cost Threshold for Parallelism (CTfP) is one of my favorite server level settings in SQL Server. I remember the first time I heard this setting mentioned by Grant Fritchey. I quickly hopped on my servers and found them all set at the default (5) and adjusted them to 50 for the non SSRS servers and 30 for the SSRS ones. That was many years ago, but I had kept those numbers in my head because I didn’t know a better way.

Peter Shore gave an awesome presentation on Waits to our user group last week and reminded me of how much this setting can impact tuning. He also pointed us to a fantastic blog by Jonathan Kehayias about how to know the correct setting for your CTfP.

Peter explained that as I ran Jonathan’s awesome query, I would start to see a point in the StatementSubTreeCost column to help me identify the best CTfP for my environment.

My first thought after looking at this query, “I am so glad Jonathan wrote it because with that much XML, I wouldn’t know if it were safe to run without that trust.”

Today, I gave it a go. I kicked off the query and held my breath. Then I started to turn blue and realized this would probably take a minute. It took about 15 minutes and I was happy I didn’t panic at the wrong disco. It runs in a read uncommitted state which prevents blocking (thank you so much!) and I ran sp_whoisactive over and over to be safe.

This is Jonathan’s query, but I recommend you read his article too because there was so much good information.

SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED; 
WITH XMLNAMESPACES   
(DEFAULT 'http://schemas.microsoft.com/sqlserver/2004/07/showplan')  
SELECT  
     query_plan AS CompleteQueryPlan, 
     n.value('(@StatementText)[1]', 'VARCHAR(4000)') AS StatementText, 
     n.value('(@StatementOptmLevel)[1]', 'VARCHAR(25)') AS StatementOptimizationLevel, 
     n.value('(@StatementSubTreeCost)[1]', 'VARCHAR(128)') AS StatementSubTreeCost, 
     n.query('.') AS ParallelSubTreeXML,  
     ecp.usecounts, 
     ecp.size_in_bytes 
FROM sys.dm_exec_cached_plans AS ecp 
CROSS APPLY sys.dm_exec_query_plan(plan_handle) AS eqp 
CROSS APPLY query_plan.nodes('/ShowPlanXML/BatchSequence/Batch/Statements/StmtSimple') AS qn(n) 
WHERE  n.query('.').exist('//RelOp[@PhysicalOp="Parallelism"]') = 1 

After running it, I got back 43 records. I felt that was low until I remembered that our CTfP is set higher than my brain standard at 150. After looking over the results, I felt that 150 was about right for this environment. I didn’t stop there.

Jonathan had mentioned how he uses this query to identify what needs to be tuned, and since tuning is my favorite, I started to play with the queries to get them running better.

Huge THANK YOU to the awesome SQL Server Community that is always willing to share and teach! I love being able to find what I need from people that I trust to make my job easier and I couldn’t do it without all of you!

Hugs and please stay safe!

The song for this post is Sunshine by OneRepublic.