Less can be more
Some time ago, we received an enquiry in which a customer illustrated a general problem using the LUN check as follows:
If I run
./check_netapp_pro.pl LunState -H fas --alarm_limit 1
This will show you all the LUNs on the Cluster, both online and offline. We have systems that contain over 200 LUNs, so Nagios shows a very long list of LUNs, which is confusing for some users.
They are asking for just a “one liner”, like “1 LUNs is offline, 199 LUNs are online”.
I have tried to reduce the number of lines the check displays, but the only way I have found is to use the “—top” option which does reduce the number of lines, yet it also cuts the LUNs which are offline. For example, if you were to have 10 LUNs offline and you run “—top=3” it would only show you 3 LUNs, and if you didn’t sort the output on status, you will see 3 online LUNs,
So this specific customer wants to see only those instances (in the example these are LUNs) in the message that are not OK.
In the meantime, we have said goodbye to Perl and only write new checks in Go in order to be able to offer our customers precompiled binaries. Consequently, we have now also implemented this feature in the Go-based plugins like Check NetApp-REST.
For the concrete syntax we chose
--show-instances=<exit-code list>. In most cases the combination
--show-instances=CRITICAL,WARNING will be used to show all non-ok instances.
Documentation and Availability
--show-instances option first appeared in the v1.2.0 release of all Check NetApp-REST plugins. The Check E-Series plugins will also get this option with the next release.
More details can be found on https://docs.monitoring-plugins.pro/global-topics/show-instances.