We are having a new check proposed by one of our customers who had an issue with a single process eating up all the CPU time on a filer. It’s easy to identify the culprit once you are on the command-line of the filer (priv-mode) by issuing the ps command. To automate that sort of monitoring and getting an alarm …
The recently implemented Update-Mode for the getters lead to a reduction in the monitoring systems CPU load between 30 and 50% reported one of our customer, a large automotive company from Germany. The background of this significant performance gain is their unusual configuration system which run the getter for every …
I can recommend the unstable release 3.10.1_10 to all experimenting monitoring admins. Above all, this has the character of a technology preview. Included are two major innovations:
The update mode for all getters The option to output Grafana compatible performance data even for status checks. Update Mode The update …
Mapped and online LUNs can remain unused in a filer because a connected initiator is missing. With the NetApp-CLI, such LUNS can be detected by following these steps:
First, we look for initiator groups for which **all **initiators are not logged in.
node01::> lun igroup show -igroup \*group06\* -instance Vserver …
One of the most innovative features of check_netapp_pro is probably the option -rm_ack, which solves the problem of errors not being alarmed for confirmed overall checks. These errors will not be alarmed actively and can therefore be easily overlooked. This switch might soon be replaced by another, more sophisticated …
Our new check PerfAggregate measures various performance counters per aggregate. Currently, the following counters can be monitored:
aggr_throughput (B/s) latency (µs) read_data (B/s) total_transfers (/s) total_transfers_hdd (/s) total_transfers_ssd (/s) transmit_failure (-) user_read_blocks (/s) user_read_latency (us) …
The newly published check_netapp_takeover has been enhanced to check the metro-cluster configuration too. All of these checks are done in one service-check. Example:``` $ check_netapp_takeover.pl -H filer01f NETAPP_TAKEOVER OK node filer0101 connected (ha mode): The the storage failover facility is enabled. Takeover of …
If you are checking your filer with the Snapshots plugin, you may get an error like: Can't call method "attr" on an undefined value at /opt/.../lib/nagios/plugins/check_netapp_pro/checks/Snapshots.pm line 289. The reason for this is, that the SnapDrive application creates flexclones, which are created, …
The Check UsageTrend can be used to interpolate the usage of a volume or aggregate into the future and send an alarm if it would get filled up within a given time. Until now this check measured the size in bytes. From now on we have an additional argument ‑‑what=size|inodes. Setting this argument to inodes tells the …
Obviously, one of the first plugins that we implemented was designed to monitor volumes and aggregates and their respective usage (“how many bytes are still available”). This plugin is called Usage and from the get-go it could monitor disk space usage in bytes as well as the number available (or used) …