First Equallogic Disk Failure in Two Years

By admin, December 12, 2012 18:24

Just received three email alerts simultaneously from SAN HQ, Dell OME and EQL Group Manager all saying slot 5 in one of the Equallogic members has failed, the last disk (slot 15) kicked in and the raid set is reconstructing. Called local Dell ProSupport, parts is being arranged and will be delivered to data center within 2-3 hours. (Update, raid reconstruction took about 3 1/2 hours to complete)

It was quite a black day today as this is the 2nd incident happening to my equipment, and Equallogic SAN was under very light load and the disk just failed without any pre-warning. Where is the predicative disk failure feature in Equallogic after all?

1

Alert from Dell OME:
Device: , Service Tag:, Asset Tag:, Date:12/12/12, Time:17:46:17:000, Severity:Critical, Message:Sent when eqlDiskStatus changes from one state to another state. Variables: eqlDiskStatus=Failed,eqlDiskSlot=5

Alert from EQL Group Manager:
Warning health conditions currently exist.
Correct these conditions before they affect array operation.
Non-fatal RAIDset failure. While the RAID set is degraded, performance and availability might be decreased. There are 1 outstanding health conditions. Correct these conditions before they affect array operation.
Failure: HDD Drive: 5, Model: ST3600057SS , Serial Number: 3SL14VVR
Reconstruction of RAID LUN 0 initiated.

Alert from SAN HQ

  • 12/12/2012 5:45:46 PM to 12/12/2012 5:47:46 PM
    • Warning: Member eql RAID Set Is Degraded
      • Warning: Member eql RAID set is degraded because a disk drive failed or was removed.
    • Warning: Member eql RAID More Spares Expected
      • Warning: Member eql The current RAID configuration requires more spare drives then are currently available.
    • Warning: Member eql has a failed drive in slot 5