Possible false negative: Broken disks have no node

Recently we got an interesting bug-report: A broken disk did not get alarmed by the check_netapp_disk container-type-plugin. This false negative happened because of a configuration, where disks are checked per node:

check_netapp_disk container-type -H filer … ‑‑include=~^NODE-B\.

This will check all disks on NODE-B. But a broken disk may not be related to any node and so it wont show up in the results.

In this respect, although the plugin’s output is correct, it is clearly not what the user intended when creating this check.

Possible solutions

You could reverse the logic by excluding the other node. E.g. ‑‑exclude=~^NODE-A\. results in (in case of a two-node cluster):

NETAPP DISK CONTAINER TYPE CRITICAL - 24 disks checked, 2 CRITICAL
24 instances skipped because of --include/--exclude settings
NODE-B.1.1.3: broken (CRITICAL)
.1.1.7: broken (CRITICAL)
NODE-B.1.1.19: shared
NODE-B.1.1.21: shared
NODE-B.1.1.16: shared
…

Alternatively, you can configure a dedicated check for nodeless disks using the parameter ‑‑include=~^\.\d. In this example the result would be:

NETAPP DISK CONTAINER TYPE CRITICAL - 1 disk checked, 1 CRITICAL
47 instances skipped because of ‑‑include / ‑‑exclude settings
.1.1.7: broken (CRITICAL)

Check NetApp REST v3.2.0 released

Comments