Badly Performing Disks

Table Of Contents

PerfDisk  monitors the usage of every single disk. If just one disk exceeds the threshold, the check sends an alarm. BadlyPerformingDisks  is a new check that first analyses every disk, counts every disk that exceeds a defined threshold and only sends an alarm if a specified amount of disks exceed this threshold. This simplified example shows a scenario with 6 disks: ./check_netapp_pro **PerfDisk** ... -w 85 -c 95 NETAPP_PRO PERFDISK WARNING - 6 disks checked, 0 critical and 3 warning 1.10.20 (/aggr1_st6_sata/plex0/rg1): 91.7% (WARNING) 1.10.19 (/aggr1_st6_sata/plex0/rg1): 91.2% (WARNING) 1.10.21 (/aggr1_st6_sata/plex0/rg1): 91.2% (WARNING) 1.10.10 (/aggr1_st6_sata/plex0/rg0): 76.9% 1.10.5 (/aggr1_st6_sata/plex0/rg0): 76.7% 1.10.14 (/aggr1_st6_sata/plex0/rg0): 76.4% For this example we can configure BadlyPerformingDisks to return **OK **by defining a disk as “highly-utilized” if its usage is higher than 90%: ./check_netapp_pro **BadlyPerformingDisks** ... -w 80 -c 95 --highly_utilized=90 NETAPP_PRO PERFDISK OK - 6 disks checked, 3 of them (50.0%) are highly utilized(usage > 90%). In our case, 3 disks fall into this category - so 50% for a total of 6 disks. 50% is much lower than the 80% that we set in our example, after which a WARNING will be triggered. In order to receive a CRITICAL result with the same data from the PerfDisk check above, we have to set a lower percentage of 45%, and anything below will be regarded as OK: ./check_netapp_pro **BadlyPerformingDisks** ... -w 40 -c **45** --highly_utilized=90 NETAPP_PRO PERFDISK CRITICAL - 6 disks checked, 3 of them (50.0%) are highly utilized(usage > 90%). Another option is to change the threshold used to define a disk as “highly-utilized”: ./check_netapp_pro **BadlyPerformingDisks** ... -w 80 -c 95 --highly_utilized=**70** NETAPP_PRO PERFDISK CRITICAL - 6 disks checked, 6 of them (100.0%) are highly utilized(usage > 70%).

Evaluation per Aggregate

Just like PerfDiskBadlyPerformingDisks  can only evaluate the disks belonging to a specific aggregate. This can be done by setting the option ‑‑raid_group=pattern. For example, let’s examine all disks for aggregate aggr1 with ‑‑raid_group=^aggr1$, since pattern is interpreted as a regular expression.


7-Mode SpanVault Check for Lag-Time
CPU Average

Comments