I’m Going Under and This Time DBCC Can’t Save Me…

Over the weekend I was troubleshooting an issue and found that part of it was being caused by corruption on one of my databases.  I have seen DBCC CheckDB fix corruption a few times and thought, “No problem, I’ve got this.”

I ran:


ALTER DATABASE CorruptDB SET SINGLE_USER WITH ROLLBACK IMMEDIATE;
BEGIN TRANSACTION;
DBCC CHECKDB ('CorruptDB', REPAIR_REBUILD);
ALTER DATABASE CorruptDB SET MULTI_USER;
COMMIT

I waited for it to finish because the rule is, “Let it finish”.

There were still errors. So I ran it again.

The same errors appeared saying that some errors were waiting on others to be fixed first.  But nothing had fixed and my heart sank.  I had a moment of panic as I furiously googled.  I saw post after post telling me to restore the backup…Did I mention the small problem that the corruption had prevented the backups from completing for this database but didn’t throw an error? My backups were over two weeks old and a roll back was not an option.

I found a post by Paul Randal from 2006 talking about when DBCC CheckDB won’t solve the issue.  He mentioned that sometimes it is just an index or statistic that is corrupted.  This gave me hope and I started tracing my error.  I found it was on a Change Data Capture table that is being used in our replication.  I began pondering an idea.  I removed replication and ran the DBCC CheckDB.  It was clean! No corruption! I breathed and took a backup. Then I added replication back on and ran the DBCC CheckDB again.  It was clean again! Everything was healthy and fine!

Thank you to everyone for your past posts that save my sanity and thank you for DBCC CheckDB!!!

 

The song for this post is Someone You Loved by Lewis Capaldi

If I need to rearrange my fragmentation, I will for you….

Oh my goodness, I have been buried and learning tons about Indexing.  Something super cool that I learned was how to know if you need to adjust your fillfactors on your indexes.  My whole DBA career I have usually set it 85 and forgotten it. I learned that I can check my CommandLog and see how often that index is being rebuilt.   I am currently using Ola and when my indexes are rebuilt for maintenance (rebuilt nightly in this case), it is all logged to a table named CommandLog in the master database.  If an index is rebuilt everyday, adjust the fillfactor  down (75) so that it will rebuild less often.  If it nearly never rebuilds, adjust the fillfactor up (90) so it gets rebuilt once in a while and to avoid wasting space in an index. Here is the query I am using to see how often an index has rebuilt in the last 10 days.


SELECT Command, COUNT(StartTime) AS Rebuilt
FROM master.[dbo].[CommandLog]
WHERE CommandType LIKE '%Index%' AND StartTime > GETDATE()-10
GROUP BY Command
ORDER BY Rebuilt DESC

When I see ones that have a rebuilt number of 10, 9 or 8, I know those are rebuilding nearly everyday.  I will adjust their fillfactors down so that I don’t rebuild as often.  My sweet spot right now is once a week, but that doesn’t work for every index (or environment).  This is where the art part comes into the DBA world and I have to think about what is best for the system.  If I have already adjusted something down to 75 and it is still rebuilding frequently, I will adjust that number down more (65, then 55), and back up if it is too low.  I also evaluate how much that index is used and how important it is to keep healthy. If it is hardly ever used, do I want to waste fillfactor space?

Indexing is absolutely an art, but now I have a new brush for my kit by being able to query the CommandLog.

This post’s song is Particles by Nothing But Thieves

All of my friends say, “How much longer will it run?”

I have frequently needed to see how much longer a backup was going to run, or how much longer a restore was going to run.  This little bit of code is something I use to help me know how to plan.  I replace “Backup” with “Restore” if I want to see how much longer to plan for a restore.  I also use it for when I am tracking a rollback.  I remove the where clause and get a large list, the one that has a percentage and not a clear explanation is usually the one I want.   Sometimes the time remaining is lie.  I have had it imply 4 hours, when it took 13.  The comfort is that I can tell it is still working.


SELECT command, percent_complete,
'elapsed' = total_elapsed_time / 60000.0,
'remaining' = estimated_completion_time / 60000.0
FROM sys.dm_exec_requests
WHERE command like 'BACKUP%'

The song for this post is “Just Another Girl” by The Killers

SELECT * Is Bittersweet…I’m not trying to hurt you, I just love to query….

Greetings!

In preparation for my Groupby.org presentation on Friday, I wanted to do a post about why SELECT * is bittersweet.

Let’s start with the bitter first:

  1. If I have a wide table (lots of columns or columns that have large data types) and I select them all,  I am reading A LOT more data than I probably need to read which takes up processor.  This will also take up space in my buffer cache and can knock other things out.  I am making this sound extra scary to discourge anyone from doing it for a whole table or even a lot of rows.
  2. Indexing goes out the window with SELECT * because SQL will do a full table scan.  If I create an index using every column to make it work (and please don’t do this, it stores the table twice) , someone will come along one day and add a column to my table and it will do a full table scan again. There will be a domino effect in my system leading to the end of the world or at least poor performance.
  3. Using SELECT * in a Stored Procedure or View can seem simple and easy.  I automatically have all the columns I need.  What happens when I add a column to my table that lives underneathe?  It can change the behavior or even break the Stored Procedure or View and anything that relys on them.  Some of the issues may not show up at first for example, it may sort differently, or I may get duplicate data.  Then, I have to go back through everything trying to figure out what changed and where to fix it.  Huge headache ahead on this one.

We have seen the bitter, now let me show you the sweet:

  1. When I have a small table and can’t remember the names of all the columns, SELECT * is a quick way for me to see what the columns are named.
  2. When I need to see what the data looks like in the table, I can use SELECT Top 5 * and get an idea of the data.  By limiting the data I am bringing back, I don’t cause as many of the scary reads which uses less processor and I don’t fill up my buffer cache with useless stuff.

SELECT * Can be a friend if it is used carefully and knowingly.

*Update* Here is the GroupBy Presentation I did on T-SQL for Beginners if you would like to watch it.

The song for this post is Bittersweet by Panic! At the Disco

I take it all back and want an “Undo”

“You made a mistake, now let it go and move one and don’t make it again”.  It is a super easy thing to say to someone that has just had a bad day. But when you are the one that made the mistake, it becomes a lot harder to move on. You think about it, and assume everyone else is thinking about it. “Look at that loser over there, they brought the whole system down.”  You know they are thinking it.

Last week, I was in the situation of being the friend of someone that made a mistake.  I honestly had forgotten about it within an hour, but he kept bringing it up, over and over again.  All day long I was reminded of what he had done wrong because he wouldn’t let it go.  At the end of the second day, I told him that if he kept telling me he was terrible at his job, I would start to believe him.  I told him to let it go and if nothing else, to stop talking about it because we all had forgotten.  We all had forgiven him, he just couldn’t forgive himself.

This week, I was the one that made the mistake. It was big and everyone saw it.  I really wanted a time machine or an “Undo” button.  Instead, I went home, cried, ate some chocolate and watched cartoons to feel better.  My husband, and biggest cheerleader, asked me, “Are you perfect?” I have been tricked by this one before so I replied that I wasn’t.  He looked at me and said, “Then stop beating yourself up for being human”.  This made me think about how I handle things everyday.  I also realized I had take all the words I had said to my friend and apply them to myself.  I made a goal to not bring up the mistake to anyone after one day.  If they mention it, we can talk about it or I can answer questions about the incident if needed.  Also, at the end of the day, I have to forgive myself and move on.  My entire team was AMAZING and kind when they learned it was my fault.  They knew how bad it hurt to make a mistake and knew I would be my harshest critic. This is not to say that I am taking it lightly, it will be something I don’t do again and am disappointed in myself.   I am really lucky to have an awesome boss that doesn’t hold mistakes over my head.

Every Technology person I have met, has a great story of how they messed up big.  If they don’t have one, they are either lying or don’t push themselves to grow.  I have a whole bag of them, but each one is something I learned and grew from, and thankfully,  made me better.

So here goes, end of the day, time to forgive myself and move forward.

*Hugs*

 

The song today is: Take it all Back by Judah & the Lion

I won’t be late for this, late for that because I have Time Zone Info….

One of the new items in SQL Server 2016 is the super awesome time_zone_info table.  When I heard about it, I started to think about all the cool things that it could help me do.  First, let’s look at the table.

SELECT *
FROM [sys].[time_zone_info]

time_zone_info

Yes, it is 132 rows of magic! Now that we have this super cool table, how do we use it? Let’s pretend that my data is time-stamped in US Mountain Standard Time, but I want to display it in Western Australia Standard Time.  I would do it like this:

SELECT GETDATE() AS GETDATE_Time,
 GETDATE() AT TIME ZONE 'US Mountain Standard Time' AS Mountain_Time,
 GETDATE() AT TIME ZONE 'US Mountain Standard Time' 
   AT TIME ZONE 'W. Australia Standard Time'AS W_Aus_Time;

I am including the GetDate column so you can see that GetDate is using my time zone, but I have to tell it what time zone it is before I can convert it to another one.

time-zone-query

Caution: If I put in the Hawaiian time zone instead of Mountain time on the GetDate, SQL won’t correct me, it will just do the math like a good little system and assume I know what time zone I am using to start.

I am am really loving the new features in SQL 2016! I hope you are enjoying it too!

This posts song is Cleopatra by The Lumineers

What can I say except “You’re Welcome for the AG voting script”

We recently had an issue where the network between our GEO-Cluster would go down and both Availability Group Instances thought they were supposed to take charge.  When the network came back up, both of them still though they were in charge.  You can imagine with an AG, you can’t have two instances that think they are in charge without problems.  This brought up the question of how voting was configured between the two of them.  This script helped a bunch:

SELECT member_name, member_state_desc, number_of_quorum_votes
 FROM sys.dm_hadr_cluster_members;

We found that the File Share Witness wasn’t working properly by checking the member state. In a simple AG, a good practice is to have each instance and then a File Share Witness,that keeps each side from accidentally taking over.  You’re Welcome.

The song for this post:  You’re Welcome From Moana

Lay your weary head to rest and run your query more…

Just coming from a wonderful week at PASS Summit. I sat in two separate sessions where this trick came up and watched as people furiously wrote it down.  Did you know that you can set a query to run multiple times in one command?  All you have to do is add “Go” and the number of times you want it to run.  For example:

SELECT GETDATE()
GO 5

Will return something like this:

go5results

Super cool, right?  It was mentioned in both Brent Ozar’s and Argenis Fernandez’s presentations and I didn’t want to forget. Now you can easily run your queries more and create load on your test system without having to click run over and over again.

Today’s post features Lyrics from Panic! At the Disco’s Cover of Carry On My Wayward Son

Listen to the AG that’s tricking you…

 

Another AG (Availability Group) Post? Yes, I learned something new and it must be cataloged. When you are failing AG’s back and forth really fast and a major indexing job kicks off in the middle, it can cause a transaction to have to rollback.  This rollback may take a REALLY long time, even if you were only on the node for 10 minutes and a large transaction had only been running for about 5 minutes. When I failed back to my preferred primary node and the AG Dashboard didn’t go completely green, I got worried.  Why in the world would it not go green? I just failed to the preferred secondary and applied a patch (see? I learned.) and then was failing back. It had been green when I started, green when I failed over to the secondary and now one of my biggest databases was not synchronizing on the primary….*sigh*

I panicked. In this situation I would normally pull the database out of the AG and then re-add it.  I didn’t have that option because it is a HUGE database and didn’t have that much time and space to move it around. I knew a large transaction had kicked off (thank you alert email that I created to warn me about such things) but thought that surely the rollback would have cleared quickly.  That lead me to looking for rolling back transactions.

I ran this on the alarming secondary node:

SELECT R.session_id, R.command ,R.status, R.percent_complete
FROM sys.dm_exec_requests R
WHERE R.command IN ('killed/rollback','rollback')

To my surprise, there were no results.  Nothing was killed or rolling back; or was it? I ran the query again, but this time without the where clause.

SELECT R.session_id, R.command ,R.status, R.percent_complete
FROM sys.dm_exec_requests R

I saw one command listed as “UNKNOWN TOKEN” that had a percent complete at about 5%. That percent was rising. I theorized that this was my rolling back process and when it finished, my AG would be healthy again.  The system isn’t used overnight. We had started the maintenance in the late afternoon and it was the secondary node in trouble, so I had time to test my theory.  It was an agonizing 8 hours as I kept checking on the percent_complete all evening.  It finally completed and the AG went green.

My lesson learned: When my AG isn’t healthy and I have already resumed data movement, before I pull the database out of the AG,I need to check for processes that have a percent complete on the secondary node. Being patient is really hard but necessary with AG’s.

The song that goes with this post Listen to the Man.