Scott Nolin
August 2014
This information is used for linux systems and monitoring Dell MD1200 disk arrays directly attached via SAS, with no RAID card. Typically you might use vendor tools to monitor array and disk health, but if not available these methods should generally work.
These documents are copied from University of Wisconsin SSEC working documentation and may be useful for some, but we provide no guarantee of accuracy, correctness, or safety. Use at your own risk. |
To detect disk failure, simply check the zpool status. There are various scripts to do this for nagios/check_mk.
To monitor predictive drive failure, we use 'smartctl' provided by the 'smartmontools' package for centos.
Example check_mk script:
#!/bin/bash
#
# Scott Nolin
#
EXIT=0
unset OUTSTRING
DISKS="$(/bin/ls /dev/disk/by-vdev| /bin/grep -v part)"
UNHEALTHY_COUNT=0
for DISK in ${DISKS}
do
HEALTH=`smartctl -H /dev/disk/by-vdev/${DISK} | grep SMART`
HEALTHSTATUS=`echo ${HEALTH} | cut -d ' ' -f 4`
if [[ $HEALTHSTATUS != "OK" ]]; then
status=2
else
status=0
fi
echo "$status SMART_Status_${DISK} - ${DISK} ${HEALTH}"
done
##########################
While the above techniques tell you if you have a disk problem, you still need to monitor the status of the arrays themselves. For our particular problem this is MD1200 disk arrays via SAS. For us, sg3_utils and sg_ses is the best answer so far.
To monitor our enclosures we use this script: check_md1200.pl