rm_ack – an innovative Switch with the additional touch

The abbreviation rm_ack stands for remove service acknowledgement, which refers to a rather simple algorithm that can be used to identify masked alarms. Masked alarms are a phenomenon that solely appears with checks on multiple instances, such as the simultaneous checking of all volumes on a NetApp device. In a nutshell, during such a check, it can happen that a critical service check in Nagios gets acknowledged (Service Acknowledgement) because of a specific volume. This leads to alert messages that can no longer be sent. In the meantime, a second, maybe more important volume might reach a critical state. But, the overall check has already been acknowledged since it has been marked as critical. As a result, the people that should be notified are no longer receiving any active alarm messages that another volume is about to fail. In other words: the first failure and its acknowledgement is masking all following failures - hence the name of masked alarms.

The algorithm activated by –rm_ack verifies the results from previously run checks and decides if a change in cause compared to the last check has occurred. Should an impairment be detected in one of the instances, the plugin will do the following:

Notice in the Nagios output (GUI and alert) about the fact that a change in cause has occurred and that something has changed.

Reset of Service Acknowledgements in Nagios, so that Nagios sends alert messages again.

The following table depicts the course of events of a check with multiple instances (Overall check), where failures or errors are no longer being reported due to the acknowledgement (ACK) of a previous failure.

The RM_ACK function resets the acknowledgements in Nagios. Resets are represented by green arrows in the table.

rm_ack – an innovative Switch with the additional touch

Comments