While doing a planned Availability Group failover, the application stopped talking to the database. After checking the SQL Server log, we found that all the SQL Logins were failing with an “incorrect password” error. The logins were on the server, the users were in the databases, and the passwords were even right, so what was wrong? It all comes down to SID’s (Security Identifiers).
I am going to back up a little bit and point out the few main items that don’t automatically synchronize between two AG Servers. I like to think in terms of Database Level vs Server Level. If it is stored in a database and that database is in the AG, then it will synchronize. The msdb database and the master database can’t be in an AG, those two hold the agent jobs, linked servers, backup history and logins. Because a sql login can be created on two different servers, even if they have the same username and password, they can have a different SID which will make them mismatch. That is what had happened in this case.
I like to use a script when creating these logins that has the SID specified so they will match between all the AG nodes. It can be done like this:
CREATE Login MyLogin WITH password = 'SecurePassword', SID = 0x06046EG0F6C0C6488GCGG3G815EFF5GA
But how can I check if the SIDs match once they are created? I like to create a temporary linked server between the nodes to make it easier for me to verify them (it can introduce a security risk if left there). Then I run something like this:
SELECT A.name, A.sid, B.name, B.sid
FROM NodeA.master.dbo.sysusers A
LEFT JOIN NodeB.master.dbo.sysusers B ON A.name = B.name
WHERE A.sid <> B.sid
I absolutely could do this without the temporary linked server by querying each node and comparing the SID’s but this is a little faster.
This posts song is Only Love Can Hurt Like This by Paloma Faith