Last week I did a few things wrong. The good news is I learned from it and now can prevent myself from repeating it.
So I have this AG, it is kind of important, hence the AG part but after 5pm I have two hours that it can be down, or so I thought. We recently added new functionality that requires it to be up all the time. I have been applying SQL 2014 SP2 to all of my 2014 servers all of which have been in AGs and it has been super easy, less than 10 minutes of work and only a reboot of downtime. (I am a little old fashioned and always reboot after a service pack.) My mind said, “Sweet, you can get this done and no one will notice and if you do both at the same time and delay the reboot on one of them, there will be no down time.” That was my first mistake. My second one was starting a few minutes early. I was excited to have it done because it was my last round of service packs. My final mistake was not realizing that SSISDDB is considered a system database and should NEVER be in an availability group. It may be super awesome because you can keep your SSIS catalog completely in sync. You could maybe do it if you plan to remove it before doing service packs or any kind of upgrades to the server, bur as far as setting it and forgetting it, you are in trouble if you do it.
Here are a few of the fun errors that we saw.
“Script level upgrade for database ‘master’ failed because upgrade step ‘SSIS_hotfix_install.sql’ encountered error 942, state 4, severity 25. This is a serious error condition which might interfere with regular operation and the database will be taken offline. If the error happened during upgrade of the ‘master’ database, it will prevent the entire SQL Server instance from starting. Examine the previous errorlog entries for errors, take the appropriate corrective actions and re-start the database so that the script upgrade steps run to completion.”
“Cannot recover the master database. SQL Server is unable to run. Restore master from a full backup, repair it, or rebuild it. For more information about how to rebuild the master database, see SQL Server Books Online.”
Huge thank you to all the people that have blogged about Trace Flag 902. It allowed us to start up SQL Server and find the errors and pull SSISDB out of the AG and get the service packs to finish running and everything was happy and great. Here is the list of steps from the Microsoft KB article:
Enable trace flag 902 on the instance of SQL Server. To do this, follow these steps:
- Open SQL Server Configuration Manager.
- In SQL Server Configuration Manager, click SQL Server Services.
- Double-click the SQL Serverservice.
- In the SQL Server Properties dialog box, click the Advanced tab.
- On click the Advanced tab, locate the Startup Parameters item.
- Add ;-T902 to the end of the existing string value, and then click OK.
Lessons Learned:
- Don’t start earlier than you say you are going to start.
- Don’t do both sides of an AG at the same time. Do the Passive one first and make sure everything is up and working and the AG is healthy before you do the primary node. Verify the fail-over works to the passive node so that you have no down time.
- Don’t get too comfortable with Service Packs and Cumulative Updates. They are still a big change even though they are pretty stable.
- Don’t have any system databases in the AG while applying patches and know that SSISDB is a part of the system databases.
- An awesome team that can back you up and help you trouble shoot can make all the difference. It is amazing to have a boss that believes in you and is encouraging to help you keep going even when you want to give up and go cry in the closet.
Also for those of you following along at home and what to know what song goes with this post: It’s Different for Girls
Happy Service Packing!
I think I kind of know the feeling you had when everything went wrong. Not a fun one! Thanks for sharing how you fixed this to help others.
So how do I know if I have an SSISDB in my AGs?
Open management studio, connect to one of the nodes in your AG and look under the “Always On High Availability” section. Expand the “Availability Groups” and you will see the name of your AG, under that there is 3 sections. The middle one is “Availability Databases” and if you expand it, you will see the names of all the databases participating in the AG. If SSISDB is one of them, you will want to remove it from the AG before you apply patches or updates of a SQL nature. You can add it back in once you are done if you would like. I have had a few people mention that they like to keep SSISDB in an AG so that their SSIS packages are always synchronized.
Thanks Andrea, just what I needed.
I also recently suffered an outage while applying SP2 to a SQL 2014 AG environment; I was a little too eager to patch the last node and didn’t notice the synchronization was being blocked before I kicked off the patching on the last instance. This resulted in none of the replicas knowing which one was the Primary. Luckily a couple of reboots got it sorted. Thanks for a great article, especially the lessons learnt section!
Maybe a good suggestion.. Do the steps first in a non-production Environment
I completely agree, the difficult part is that this environment doesn’t have a test environment at all. I had deployed this patch to a lot of environments already so I felt super confident, but that confidence was my downfall. 🙂
Hello of Andrea,
Thank you for the article, I have found it by the newsletter from Brent Ozar. Please excuse my bad English. I don’t want to be rude.
I have two links from MSDN, which describe the requirements and therefore your problems:
https://msdn.Microsoft.com/en-us/library/dn178483%28V=SQL.120%29.aspx
Have you had this article?
Kind regards
Arndt
Hello Andrea,
probably you had read this in the newsletter from Brent Ozar:
https://blogs.msdn.microsoft.com/sql_server_team/cdc-functionality-may-break-after-upgrading-to-the-latest-cu-for-sql-server-2012-2014-and-2016/
it’s important for SSIS 😦
And the link https://msdn.microsoft.com/en-us/library/mt163864.aspx is broken. :((
Kind regards
Don’t you love when they break them? The Article from Brent is great but I don’t use CDC so it didn’t affect our AG. Always something to keep in mind. Thank you!