Improved Error Messages

The most common error messages that occur during fresh installations are those related to the performance data collected by the getters. Previously, they only contained the information that the delta of the check is too big or too small for the interval, with which the getter collects the data and that the settings have to be adjusted. Soon the plugin will analyze this particular data and suggest specific solutions based on the given data and the current configuration.

Examples

Below are few examples based on a store and its data that was renewed every 20 seconds - typically these intervals will be between 18 and 22 seconds (This is a typical test scenario, in a production environment intervals lower that 3 minutes are not common).

Example 1: The –delta is too small / the interval of the getter is too long

The interval between calls in the store is too large for the checks `–delta` (10 seconds, with a tolerance of +/- 2). Either reconfigure the collector or adapt the `–delta` of this check. DETAILS: The sum of the checks delta and tolerance is 12 wheras the interval between the last two calls available in store is 21 seconds. So the difference is 9 seconds. Following some recommendations how you could solve that based on the above numbers. SOLUTION 1: Change the getters `interval` to 10. SOLUTION 2: Change the checks `–delta` to something around 21. The above are recommendations only! In case that your getter is running too unsteady they may be nonsense. (The *getters* switch `–explore=calls` shows a list of all calls found in the store together with their time-stamps.) Also remember that `interval` is an option of your monitoring-systems configuration for servicechecks wheras `–delta` and `–tolerance` are check-plugin arguments.

Example 2: The –delta is too small, but can be compensated with -tolerance

The interval between calls in the store is too large for the checks `–delta` (15 seconds, with a tolerance of +/- 2). Either reconfigure the collector or adapt the `–delta` of this check. DETAILS: The sum of the checks delta and tolerance is 17 whereas the interval between the last two calls available in store is 21 seconds. So the difference is 4 seconds. Following some recommendations how you could solve that based on the above numbers. SOLUTION 1: Increase the `–tolerance` to 7. SOLUTION 2: Change the getters `interval` to 15. SOLUTION 3: Change the checks `–delta` to something around 21. […]

Example 3: The –delta is too big / the interval of the getter is too small

The interval between calls in the store is too small for the checks `–delta` (30 seconds, with a tolerance of +/- 2). Either reconfigure the collector or adapt the `–delta` of this check. DETAILS: The sum of the checks delta and tolerance is 32 wheras the interval between the last two calls available in store is 21 seconds. So the difference is -11 seconds. Following some recommendations how you could solve that based on the above numbers. SOLUTION 1: Change the getters `interval` to 30. SOLUTION 2: Change the checks `–delta` to something around 21. SOLUTION 3: Change the checks `–delta` to a multiple of 21 (42, 63, …). WARNING: Running a getter more often than needed will increase the load on your monitoring-system and filer! […] The changes will be available in December 2016 with version 3.7.1. This release focuses on usability improvements and will facilitate administration tasks.

Reduced outputs

In order to display easier-to-read outputs we suggest to use the corresponding settings –output=nagios|cmd|html. For the best results use --output=cmd in the command line and --output=html for the web interface.


Scrub Check
check_netapp_events

Comments