You’re like the calmest failing job, I need you louder with a DPA alert…

I have an RDS instance that when backups are failing, we have no idea. We use DPA as one of our alerting systems, but with RDS, failing agent jobs are harder to find. I took the built in RDS status stored procedure and adjusted it enough to send me an alert through DPA so I can know that there are problems.

There are a couple of things I have already done, I have set up my RDS instance to talk to DPA, I have granted my DPA user access to run what it needs in RDS and I have set up a user in DPA that can email\call me when there is a problem. I used this post about DPA to help me get the basics done and then I did a lot of trial and error on my code to get the alert working just right.

This is the basic status stored proc that is telling me if my backups are working or not:

exec msdb.dbo.rds_task_status @db_name='MyDatabaseName';

I tried a few things and finally realized I needed to drop it into a temp table so that I could filter it down to only get what I need:

CREATE TABLE #tempAlert 
(task_id	bigint
,task_type	varchar(200)
,database_name	varchar(200)
,[% complete] int
,[duration(mins)] int	
,lifecycle varchar(20)
,task_info	varchar(8000)
,last_updated	datetime2
,created_at	datetime2
,S3_object_arn	varchar(2000)
,overwrite_S3_backup_file	int
,KMS_master_key_arn	varchar(200)
,filepath	varchar(200)
,overwrite_file int);

INSERT INTO #tempAlert
exec msdb.dbo.rds_task_status @db_name='MyDatabaseName';

FROM #TempAlert
WHERE task_type = 'BACKUP_DB_DIFFERENTIAL' AND last_updated > GETDATE()-1 AND lifecycle <> 'SUCCESS';

DROP TABLE #TempAlert;

This code is specifically looking for any diff failures in the last day, if I want fulls, I switch task_type = ‘BACKUP_DB_DIFFERENTIAL’ to task_type = ‘BACKUP_DB’.

In DPA, I go to ALERTS >>Manage Alerts and select “Custom” for my Alert Category and “Customer SQL Alert – Multiple Numeric Return” for my Alert Type then select “Create Alert”.

I gave it a name that was descriptive and changed my Execution Interval to once a day. I don’t want it to be firing all the time against my RDS instance and running up my bill. I could run it more often if I wanted. I added some text to my “Notification Text” box that will be helpful to anyone that gets the alert. I selected my Database instances that I wanted this alert to execute against. In the “SQL Statement” I pasted the bit of code above that creates the temp table, runs the stored procedure and inserts the results into that temp table, then filters the results and finally drops the temp table.

Finally, in the Alert Level and Notifications section, I set the “High” row of “Min” to a value of one and the “Max” I left empty, then added my Notification Group. I tested it to make sure all the connections were working properly and then I saved it. “I am the backup, I am the failure and I am the alert”.

The song for this post is Matt Maeson – Mr. Rattlebone

All of my friends say, “How much longer will it run?”

I have frequently needed to see how much longer a backup was going to run, or how much longer a restore was going to run.  This little bit of code is something I use to help me know how to plan.  I replace “Backup” with “Restore” if I want to see how much longer to plan for a restore.  I also use it for when I am tracking a rollback.  I remove the where clause and get a large list, the one that has a percentage and not a clear explanation is usually the one I want.   Sometimes the time remaining is lie.  I have had it imply 4 hours, when it took 13.  The comfort is that I can tell it is still working.

SELECT command, percent_complete,
'elapsed' = total_elapsed_time / 60000.0,
'remaining' = estimated_completion_time / 60000.0
FROM sys.dm_exec_requests
WHERE command like 'BACKUP%'

The song for this post is “Just Another Girl” by The Killers