SQL Monitor’s data repository: Alerts
- by Chris Lambrou
In my previous post, I introduced the SQL Monitor data repository, and described how the monitored objects are stored in a hierarchy in the data schema, in a series of tables with a _Keys suffix. In this post I had planned to describe how the actual data for the monitored objects is stored in corresponding tables with _StableSamples and _UnstableSamples suffixes. However, I’m going to postpone that until my next post, as I’ve had a request from a SQL Monitor user to explain how alerts are stored.
In the SQL Monitor data repository, alerts are stored in tables belonging to the alert schema, which contains the following five tables:
	alert.Alert
	alert.Alert_Cleared
	alert.Alert_Comment
	alert.Alert_Severity
	alert.Alert_Type
In this post, I’m only going to cover the alert.Alert and alert.Alert_Type tables. I may cover the other three tables in a later post. The most important table in this schema is alert.Alert, as each row in this table corresponds to a single alert. So let’s have a look at it.
SELECT TOP 100 AlertId,
    AlertType,
    TargetObject,
    [Read],
    SubType
  FROM alert.Alert
  ORDER BY AlertId DESC;
 AlertIdAlertTypeTargetObjectReadSubType
165550397:Cluster,1,4:Name,s29:srp-mr03.testnet.red-gate.com,9:SqlServer,1,4:Name,s0:,10
265549387:Cluster,1,4:Name,s29:srp-mr03.testnet.red-gate.com,7:Machine,1,4:Name,s0:,10
365548187:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings,00
465547157:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings,00
565546147:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings,00
665545187:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData,00
765544157:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData,00
865543147:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData,00
965542187:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s4:msdb,00
1065541147:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s4:msdb,00
11…    
So what are we seeing here, then? Well, AlertId is an auto-incrementing identity column, so ORDER BY AlertId DESC ensures that we see the most recent alerts first. AlertType indicates the type of each alert, such as Job failed (6), Backup overdue (14) or Long-running query (12). The TargetObject column indicates which monitored object the alert is associated with. The Read column acts as a flag to indicate whether or not the alert has been read. And finally the SubType column is used in the case of a Custom metric (40) alert, to indicate which custom metric the alert pertains to.
Okay, now lets look at some of those columns in more detail. The AlertType column is an easy one to start with, and it brings use nicely to the next table, data.Alert_Type. Let’s have a look at what’s in this table:
SELECT AlertType,
    Event,
    Monitoring,
    Name,
    Description
  FROM alert.Alert_Type
  ORDER BY AlertType;
 AlertTypeEventMonitoringNameDescription
1100Processor utilizationProcessor utilization (CPU) on a host machine stays above a threshold percentage for longer than a specified duration
2210SQL Server error log entryAn error is written to the SQL Server error log with a severity level above a specified value.
3310Cluster failoverThe active cluster node fails, causing the SQL Server instance to switch nodes.
4410DeadlockSQL deadlock occurs.
5500Processor under-utilizationProcessor utilization (CPU) on a host machine remains below a threshold percentage for longer than a specified duration
6610Job failedA job does not complete successfully (the job returns an error code).
7700Machine unreachableHost machine (Windows server) cannot be contacted on the network.
8800SQL Server instance unreachableThe SQL Server instance is not running or cannot be contacted on the network.
9900Disk spaceDisk space used on a logical disk drive is above a defined threshold for longer than a specified duration.
101000Physical memoryPhysical memory (RAM) used on the host machine stays above a threshold percentage for longer than a specified duration.
111100Blocked processSQL process is blocked for longer than a specified duration.
121200Long-running queryA SQL query runs for longer than a specified duration.
131400Backup overdueNo full backup exists, or the last full backup is older than a specified time.
141500Log backup overdueNo log backup exists, or the last log backup is older than a specified time.
151600Database unavailableDatabase changes from Online to any other state.
161700Page verificationTorn Page Detection or Page Checksum is not enabled for a database.
171800Integrity check overdueNo entry for an integrity check (DBCC DBINFO returns no date for dbi_dbccLastKnownGood field), or the last check is older than a specified time.
181900Fragmented indexesFragmentation level of one or more indexes is above a threshold percentage.
192400Job duration unusualThe duration of a SQL job duration deviates from its baseline duration by more than a threshold percentage.
202501Clock skewSystem clock time on the Base Monitor computer differs from the system clock time on a monitored SQL Server host machine by a specified number of seconds.
212700SQL Server Agent Service statusThe SQL Server Agent Service status matches the status specified.
222800SQL Server Reporting Service statusThe SQL Server Reporting Service status matches the status specified.
232900SQL Server Full Text Search Service statusThe SQL Server Full Text Search Service status matches the status specified.
243000SQL Server Analysis Service statusThe SQL Server Analysis Service status matches the status specified.
253100SQL Server Integration Service statusThe SQL Server Integration Service status matches the status specified.
263300SQL Server Browser Service statusThe SQL Server Browser Service status matches the status specified.
273400SQL Server VSS Writer Service statusThe SQL Server VSS Writer status matches the status specified.
283501Deadlock trace flag disabledThe monitored SQL Server’s trace flag cannot be enabled.
293600Monitoring stopped (host machine credentials)SQL Monitor cannot contact the host machine because authentication failed.
303700Monitoring stopped (SQL Server credentials)SQL Monitor cannot contact the SQL Server instance because authentication failed.
313800Monitoring error (host machine data collection)SQL Monitor cannot collect data from the host machine.
323900Monitoring error (SQL Server data collection)SQL Monitor cannot collect data from the SQL Server instance.
334000Custom metricThe custom metric value has passed an alert threshold.
344100Custom metric collection errorSQL Monitor cannot collect custom metric data from the target object.
Basically, alert.Alert_Type is just a big reference table containing information about the 34 different alert types supported by SQL Monitor (note that the largest id is 41, not 34 – some alert types have been retired since SQL Monitor was first developed). The Name and Description columns are self evident, and I’m going to skip over the Event and Monitoring columns as they’re not very interesting. The AlertId column is the primary key, and is referenced by AlertId in the alert.Alert table. As such, we can rewrite our earlier query to join these two tables, in order to provide a more readable view of the alerts:
SELECT TOP 100 AlertId,
    Name,
    TargetObject,
    [Read],
    SubType
  FROM alert.Alert a 
    JOIN alert.Alert_Type at ON a.AlertType = at.AlertType
  ORDER BY AlertId DESC;
 AlertIdNameTargetObjectReadSubType
165550Monitoring error (SQL Server data collection)7:Cluster,1,4:Name,s29:srp-mr03.testnet.red-gate.com,9:SqlServer,1,4:Name,s0:,00
265549Monitoring error (host machine data collection)7:Cluster,1,4:Name,s29:srp-mr03.testnet.red-gate.com,7:Machine,1,4:Name,s0:,00
365548Integrity check overdue7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings,00
465547Log backup overdue7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings,00
565546Backup overdue7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s15:FavouriteThings,00
665545Integrity check overdue7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData,00
765544Log backup overdue7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData,00
865543Backup overdue7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData,00
965542Integrity check overdue7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s4:msdb,00
1065541Backup overdue7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s4:msdb,00
Okay, the next column to discuss in the alert.Alert table is TargetObject. Oh boy, this one’s a bit tricky! The TargetObject of an alert is a serialized string representation of the position in the monitored object hierarchy of the object to which the alert pertains. The serialization format is somewhat convenient for parsing in the C# source code of SQL Monitor, and has some helpful characteristics, but it’s probably very awkward to manipulate in T-SQL.
I could document the serialization format here, but it would be very dry reading, so perhaps it’s best to consider an example from the table above. Have a look at the alert with an AlertID of 65543. It’s a Backup overdue alert for the SqlMonitorData database running on the default instance of granger, my laptop. Each different alert type is associated with a specific type of monitored object in the object hierarchy (I described the hierarchy in my previous post). The Backup overdue alert is associated with databases, whose position in the object hierarchy is root → Cluster → SqlServer → Database. The TargetObject value identifies the target object by specifying the key properties at each level in the hierarchy, thus:
	Cluster: Name = "granger"
	SqlServer: Name = "" (an empty string, denoting the default instance)
	Database: Name = "SqlMonitorData"
Well, look at the actual TargetObject value for this alert: "7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s14:SqlMonitorData,". It is indeed composed of three parts, one for each level in the hierarchy:
	Cluster: "7:Cluster,1,4:Name,s7:granger,"
	SqlServer: "9:SqlServer,1,4:Name,s0:,"
	Database: "8:Database,1,4:Name,s14:SqlMonitorData,"
Each part is handled in exactly the same way, so let’s concentrate on the first part, "7:Cluster,1,4:Name,s7:granger,". It comprises the following:
	"7:Cluster," – This identifies the level in the hierarchy.
	"1," – This indicates how many different key properties there are to uniquely identify a cluster (we saw in my last post that each cluster is identified by a single property, its Name).
	"4:Name,s14:SqlMonitorData," – This represents the Name property, and its corresponding value, SqlMonitorData. It’s split up like this:
		
			"4:Name," – Indicates the name of the key property.
			"s" – Indicates the type of the key property, in this case, it’s a string.
			"14:SqlMonitorData," – Indicates the value of the property.
		
	
At this point, you might be wondering about the format of some of these strings. Why is the string "Cluster" stored as "7:Cluster,"? Well an encoding scheme is used, which consists of the following:
	"7" – This is the length of the string "Cluster"
	":" – This is a delimiter between the length of the string and the actual string’s contents.
	"Cluster" – This is the string itself. 7 characters.
	"," – This is a final terminating character that indicates the end of the encoded string.
You can see that "4:Name,", "8:Database," and "14:SqlMonitorData," also conform to the same encoding scheme.
In the example above, the "s" character is used to indicate that the value of the Name property is a string. If you explore the TargetObject property of alerts in your own SQL Monitor data repository, you might find other characters used for other non-string key property values. The different value types you might possibly encounter are as follows:
	"I" – Denotes a bigint value. For example, "I65432,".
	"g" – Denotes a GUID value. For example, "g32116732-63ae-4ab5-bd34-7dfdfb084c18,".
	"d" – Denotes a datetime value. For example, "d634815384796832438,". The value is stored as a bigint, rather than a native SQL datetime value. I’ll describe how datetime values are handled in the SQL Monitor data repostory in a future post.
I suggest you have a look at the alerts in your own SQL Monitor data repository for further examples, so you can see how the TargetObject values are composed for each of the different types of alert. Let me give one further example, though, that represents a Custom metric alert, as this will help in describing the final column of interest in the alert.Alert table, SubType. Let me show you the alert I’m interested in:
SELECT AlertId,
    a.AlertType,
    Name,
    TargetObject,
    [Read],
    SubType
  FROM alert.Alert a 
    JOIN alert.Alert_Type at ON a.AlertType = at.AlertType
  WHERE AlertId = 65769;
 AlertIdAlertTypeNameTargetObjectReadSubType
16576940Custom metric7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s6:master,12:CustomMetric,1,8:MetricId,I2,02
An AlertType value of 40 corresponds to the Custom metric alert type. The Name taken from the alert.Alert_Type table is simply Custom metric, but this doesn’t tell us anything about the specific custom metric that this alert pertains to. That’s where the SubType value comes in. For custom metric alerts, this provides us with the Id of the specific custom alert definition that can be found in the settings.CustomAlertDefinitions table. I don’t really want to delve into custom alert definitions yet (maybe in a later post), but an extra join in the previous query shows us that this alert pertains to the CPU pressure (avg runnable task count) custom metric alert.
SELECT AlertId,
    a.AlertType,
    at.Name,
    cad.Name AS CustomAlertName,
    TargetObject,
    [Read],
    SubType
  FROM alert.Alert a 
    JOIN alert.Alert_Type at ON a.AlertType = at.AlertType
    JOIN settings.CustomAlertDefinitions cad ON a.SubType = cad.Id
  WHERE AlertId = 65769;
 AlertIdAlertTypeNameCustomAlertNameTargetObjectReadSubType
16576940Custom metricCPU pressure (avg runnable task count)7:Cluster,1,4:Name,s7:granger,9:SqlServer,1,4:Name,s0:,8:Database,1,4:Name,s6:master,12:CustomMetric,1,8:MetricId,I2,02
The TargetObject value in this case breaks down like this:
	"7:Cluster,1,4:Name,s7:granger," – Cluster named "granger".
	"9:SqlServer,1,4:Name,s0:," – SqlServer named "" (the default instance).
	"8:Database,1,4:Name,s6:master," – Database named "master".
	"12:CustomMetric,1,8:MetricId,I2," – Custom metric with an Id of 2.
Note that the hierarchy for a custom metric is slightly different compared to the earlier Backup overdue alert. It’s root → Cluster → SqlServer → Database → CustomMetric. Also notice that, unlike Cluster, SqlServer and Database, the key property for CustomMetric is called MetricId (not Name), and the value is a bigint (not a string).
Finally, delving into the custom metric tables is beyond the scope of this post, but for the sake of avoiding any future confusion, I’d like to point out that whilst the SubType references a custom alert definition, the MetricID value embedded in the TargetObject value references a custom metric definition. Although in this case both the custom metric definition and custom alert definition share the same Id value of 2, this is not generally the case.
Okay, that’s enough for now, not least because as I’m typing this, it’s almost 2am, I have to go to work tomorrow, and my alarm is set for 6am – eek! In my next post, I’ll either cover the remaining three tables in the alert schema, or I’ll delve into the way SQL Monitor stores its monitoring data, as I’d originally planned to cover in this post.