EMS Log-Monitoring

How to integrate the EMS-Log into an existing System-Monitoring solution

In this article ONTAP REST APIs: Automate Notification of High-Severity Events Mahalakshmi describes how to use messages to get notified about system events, depending on their type and severity. It’s a really flexible and comprehensive way to monitor NetApps ONTAP.

What if you already have a system-monitoring solution like Nagios, Icinga, op5 Monitor or Shinken in place? In that case the destinations (the recipients of the notifications) are already defined in the monitoring system. So all you need is a monitoring-plugin which does the filtering-part.

Fortunately our check_netapp_ems plugin can be used for that: It pulls the ems-log from the filer, filters it depending on the name and calculates the rate of matching events. Then it notifies the destinations in case of too high a rate.

An example may help to understand this clearly. This command …

$ ./check_netapp_ems event-rate -H filer --name=wafl.vol.autoSize.done --rate=per_day --lookbehind=1d

… will output e.g. …

Rate of wafl.vol.autoSize.done EMS events during the last 24 hours: 13.82/day
...

So we had nearly 14 autosize-events during the last 24 hours in the above example. You can now set thresholds like --warning=15 --critical=30 to get a notification only, if the rate of events goes above 15 per day (WARNING notification) or 30 per day (CRITICAL notification).

This is not the same as what the script from Mahalakshmi does, as it goes a step further. The plugin will notify you only in cases where a certain number of events within a given period of time occur.

It could look like:

$ ./check_netapp_ems event -H filer --name=wafl.vol.autoSize.done --alarm=WARNING (to get a WARNING about every autoSize.done events that occurs)

$ ./check_netapp_ems event -H filer --name=* --severity=high --alarm=CRITICAL (to get a CRITICAL notification about every event with a high severity - regardless of the events name)

RESTful API

Like all plugins of the Check NetApp-REST family check_netapp_ems checks filers that support a RESTful API. This is true for Ontap >= 9.6.

Documentation and Availability

The check_netapp_ems plugin first appeared in the v1.1.0 release. More details can be found on our docu-site.

Possible enhancements

The present implementation filters events based on their names only. If you want to see a plugin, which will trigger an alarm through your nagios-compatible monitoring system for events matching a given name and severity drop us a line and we will be happy to implement that as well.