We have implemented (but not yet released) an enhancement in check_netapp_takeover to check if interconnect-links are up and ok. This additional aspect in the takeover-check is enabled by default:``` $ ./check_netapp_takeover.pl -H fl17.europe.example.com NETAPP_TAKEOVER OK - 9 takeover-aspects checked interconnect …
We have implemented (but yet not released) a new check to monitor the performance of qtrees namely the operations per second on qtrees. The PerfQtree-check is meant to have an eye on q-trees so that e.g. out-of-controll-applications can be detected straight to the point.
Counters These counters are supported so far by …
We have implemented (but not yet released) a change in the library responsible for the call files. Until now all call-files have been saved directly within the directory calls/. The number of files within this directory can grow quickly up to 30.000 or more which could have some performance-issues with the filesystem …
We have implemented (but not yet released) an enhancement for the cluster-mode getter get_netapp_cm. Based on a simple change in the user-interface you can now tell the getter to retrieve the data of more than one object at once. For example:``` get_netapp_cm.pl -H sim91 -o volume -o aggregate
Data collected for these …
While making most of the checks compatible with the upcoming DataONTAP 9.3 we unfortunately broke the 7m-backwards-compatibility in some direct (*) checks like check_netapp_scrub and check_netapp_spare. We are working on an update to fix this and will announce it’s availability here. *) direct checks retrieve …
Our quite new check_netapp_takeover has been in the field for several weeks and we have found some issues regarding how it handles API-errors and regarding the presentation of the errors found.
Handling of API-Errors Sometimes the API returns with a message similar to “A metrocluster check operation is in …
Overview During an internal code review we have found a bug in the check ServiceProcessor . The present version will exit with OK even if errors werde found (false negative).
Mitigation Customers using the check ServiceProcessor to monitor the status of the internal service processors (all other checks are not …
We are having a new check proposed by one of our customers who had an issue with a single process eating up all the CPU time on a filer. It’s easy to identify the culprit once you are on the command-line of the filer (priv-mode) by issuing the ps command. To automate that sort of monitoring and getting an alarm …
The recently implemented Update-Mode for the getters lead to a reduction in the monitoring systems CPU load between 30 and 50% reported one of our customer, a large automotive company from Germany. The background of this significant performance gain is their unusual configuration system which run the getter for every …
I can recommend the unstable release 3.10.1_10 to all experimenting monitoring admins. Above all, this has the character of a technology preview. Included are two major innovations:
The update mode for all getters The option to output Grafana compatible performance data even for status checks. Update Mode The update …