The abbreviation rm_ack stands for remove service acknowledgement, which refers to a rather simple algorithm that can be used to identify masked alarms. Masked alarms are a phenomenon that solely appears with checks on multiple instances, such as the simultaneous checking of all volumes on a NetApp device. In a nutshell, during such a check, it can happen that a critical service check in Nagios gets acknowledged (Service Acknowledgement) because of a specific volume. This leads to alert messages that can no longer be sent. In the meantime, a second, maybe more important volume might reach a critical state. But, the overall check has already been acknowledged since it has been marked as critical. As a result, the people that should be notified are no longer receiving any active alarm messages that another volume is about to fail. In other words: the first failure and its acknowledgement is masking all following failures - hence the name of masked alarms.
The algorithm activated by –rm_ack verifies the results from previously run checks and decides if a change in cause compared to the last check has occurred. Should an impairment be detected in one of the instances, the plugin will do the following:
- Notice in the Nagios output (GUI and alert) about the fact that a change in cause has occurred and that something has changed.
- Reset of Service Acknowledgements in Nagios, so that Nagios sends alert messages again.
The following table depicts the course of events of a check with multiple instances (Overall check), where failures or errors are no longer being reported due to the acknowledgement (ACK) of a previous failure.
The RM_ACK function resets the acknowledgements in Nagios. Resets are represented by green arrows in the table.
Comments