You’re like the calmest failing job, I need you louder with a DPA alert…

I have an RDS instance that when backups are failing, we have no idea. We use DPA as one of our alerting systems, but with RDS, failing agent jobs are harder to find. I took the built in RDS status stored procedure and adjusted it enough to send me an alert through DPA so I can know that there are problems.

There are a couple of things I have already done, I have set up my RDS instance to talk to DPA, I have granted my DPA user access to run what it needs in RDS and I have set up a user in DPA that can email\call me when there is a problem. I used this post about DPA to help me get the basics done and then I did a lot of trial and error on my code to get the alert working just right.

This is the basic status stored proc that is telling me if my backups are working or not:

exec msdb.dbo.rds_task_status @db_name='MyDatabaseName';

I tried a few things and finally realized I needed to drop it into a temp table so that I could filter it down to only get what I need:

CREATE TABLE #tempAlert 
(task_id	bigint
,task_type	varchar(200)
,database_name	varchar(200)
,[% complete] int
,[duration(mins)] int	
,lifecycle varchar(20)
,task_info	varchar(8000)
,last_updated	datetime2
,created_at	datetime2
,S3_object_arn	varchar(2000)
,overwrite_S3_backup_file	int
,KMS_master_key_arn	varchar(200)
,filepath	varchar(200)
,overwrite_file int);

INSERT INTO #tempAlert
exec msdb.dbo.rds_task_status @db_name='MyDatabaseName';

SELECT *
FROM #TempAlert
WHERE task_type = 'BACKUP_DB_DIFFERENTIAL' AND last_updated > GETDATE()-1 AND lifecycle <> 'SUCCESS';

DROP TABLE #TempAlert;

This code is specifically looking for any diff failures in the last day, if I want fulls, I switch task_type = ‘BACKUP_DB_DIFFERENTIAL’ to task_type = ‘BACKUP_DB’.

In DPA, I go to ALERTS >>Manage Alerts and select “Custom” for my Alert Category and “Customer SQL Alert – Multiple Numeric Return” for my Alert Type then select “Create Alert”.

I gave it a name that was descriptive and changed my Execution Interval to once a day. I don’t want it to be firing all the time against my RDS instance and running up my bill. I could run it more often if I wanted. I added some text to my “Notification Text” box that will be helpful to anyone that gets the alert. I selected my Database instances that I wanted this alert to execute against. In the “SQL Statement” I pasted the bit of code above that creates the temp table, runs the stored procedure and inserts the results into that temp table, then filters the results and finally drops the temp table.

Finally, in the Alert Level and Notifications section, I set the “High” row of “Min” to a value of one and the “Max” I left empty, then added my Notification Group. I tested it to make sure all the connections were working properly and then I saved it. “I am the backup, I am the failure and I am the alert”.

The song for this post is Matt Maeson – Mr. Rattlebone

Little Bit of Love, when your logs are rolling too much.

In January, the awesome Tim Radney (b|t) talked to the Utah user group about best practices. One that he mentioned was rolling over your error logs everyday and keeping 35 logs (a month plus 3 reboots). I loved this idea and implemented it using what I had done here and adding it to an agent job.

Then I realized we didn’t have any alerts on if our logs were rolling too much. Way back in my career, it used to be something that I would watch and it could mean someone was trying to hack your system and cover their tracks by rolling your logs over a bunch. I fought so much with figuring how to tell if my logs are rolling over, I had to save it for the future.

DROP TABLE IF EXISTS #EnumErrorLog;

CREATE TABLE #EnumErrorLog
(
    [Archive#] varchar(3) NOT NULL
        PRIMARY KEY CLUSTERED
    , [Date] datetime NOT NULL
    , [LogFileSizeByte] int NOT NULL
);

INSERT INTO #EnumErrorLog ([Archive#], [Date], [LogFileSizeByte])
EXEC sys.sp_enumerrorlogs;


SELECT CASE WHEN COUNT([Archive#]) > = 5 THEN 1 ELSE 0 END
FROM #EnumErrorLog
WHERE Date > DATEADD(hour, -3, GETDATE())

I create a temp table so I can execute a system stored proc to pull the information into a table and select it back out. I run this alert check once an hour, which means that for 3 hours if the alert condition has been met, it will alert me that something has rolled over too much (1 means to alert, 0 means to not do anything). I am using a third party tool right now, but I bet this could be set up with native SQL alerts or agent jobs.

The song for this post is Little Bit of Love by JP Cooper, it makes me smile even on the toughest days. *hugs*

Help me, help me, my logs are over-sized!

I have a lot of servers and only home-grown monitoring.  I needed a solution that would tell me if a log file started to grow outside of its normal size.  I listened carefully at PASS Summit on how I should “right-size” my log files and came back with a plan to tackle all of them. Fast-forward a few months and I am still having trouble even after sizing them to what I thought they should be. One of the problems is that my backup solution is “touchy” with log backups and awesome at getting the full ones. I built a report to watch for missing transaction log backups, which isn’t perfect yet.  I needed a more immediate solution and after getting another Disk Space Email Alert, I knew what I had to do.  Here it is, my Large Log File Alert.

CREATE PROCEDURE [dbo].[usp_LargeLogFileAlert]
 @LogThreshold int = 1024
AS
--This procedure sends out an alert message when a logfile has exceeded a set threshold.
--It can be scheduled to run daily so you can act quickly to resolve.

IF EXISTS (SELECT * FROM tempdb..sysobjects
WHERE id = object_id(N'[tempdb]..[#LargeLogFileAlert]'))
DROP TABLE #LargeLogFileAlert
CREATE TABLE #LargeLogFileAlert (
 DatabaseName VARCHAR(50) NOT NULL,
 LogMB INTEGER NOT NULL)

DECLARE @SizeMB INT
DECLARE @DatabaseName VARCHAR(50)
DECLARE @AlertMessage VARCHAR(500)
DECLARE @MailSubject VARCHAR(100)

/* Populate #LargeLogFileAlert with data */
INSERT INTO #LargeLogFileAlert
SELECT RTRIM(instance_name) AS DatabaseName, cntr_value/1024 AS SizeMB
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Log File(s) Used Size (KB)'
AND instance_name != '_Total' AND cntr_value/1024 &amp;gt; @LogThreshold --Threshold in MB

DECLARE LogAlert CURSOR FAST_FORWARD FOR
SELECT DatabaseName, LogMB FROM #LargeLogFileAlert 

OPEN LogAlert
FETCH NEXT FROM LogAlert INTO @DatabaseName, @SizeMB

WHILE (@@FETCH_STATUS = 0)
BEGIN
SET @MailSubject = 'Log for ' + @DatabaseName + ' has exceeded threshold on ' + @@SERVERNAME
SET @AlertMessage = @DatabaseName + ' is ' + cast(@SizeMB AS VARCHAR) + ' MB. Please verify transaction log backups are working.'
-- Send out email
EXEC msdb..sp_send_dbmail @recipients = 'you@email.com',--Change to your email
@subject = @MailSubject,
@body = @AlertMessage
FETCH NEXT FROM LogAlert INTO @DatabaseName, @SizeMB
END
CLOSE LogAlert
DEALLOCATE LogAlert
DROP TABLE #LargeLogFileAlert

GO

The threshold is adjustable by server. I picked the one I did because my servers have a lot of small databases.

Next I tossed it into a SQL Agent Job and have it run once a day on each server. It is light so I am not worried about it causing performance issues. It will email me for each database that has a log file larger than the limit I set. It won’t email at all if there aren’t any. Once I get my logs sized properly, I can turn them off. I used my CMS to deploy the stored procedure and SQL Agent job to all my servers and Yay, I know more about my environment.

This is my first attempt at joining the blog party which is number 62 – hosted this time by Robert Pearl (@pearlknows), hopefully I did it right.