LSI MegaRAID problems

Okay, for the time being I’d advise you to stay well away from LSI Logic MegaRAID SAS PCI Express ROMB┬ácontrollers (no precise versions unfortunately, but the ones I’m having trouble with have a “Web BIOS”).

Basically, these controllers don’t seem able to keep hold of RAID configs. This may have something to do with slow v. fast initialization but at the end of the day it’s up to the RAID controller to protect the disks, which this one doesn’t seem to be doing.

To explain further, we’ve had a lot of problems with some servers running these controllers; the arrays seem to fall apart at the slightest provocation (such as replacing half a RAID 1 set with a new disk) which may be because the arrays are not being fully initialised (which is why I’m testing 3 arrays at the minute, 1 “fast initialised” and 2 “slow initialised”). I’m not sure how these controllers work, but I’ve been told that they keep a lot of the config. on the disks so that you can- theoreticaklly- swap the disks from one server to another and pick the config up easily.

However… what seems to be happening is that the config. definitely isn’t on all the disks because if you try and break a RAID array and supply a fresh disk (as would happen in a real-world failure situation) the controller freaks out and messes all the arrays up (not just the problematic one). This happened last friday (21/01/2011): I started out with Array 0 (R5), Array 1 (R5) and Array 2 (R1). I pulled (carefully!) half of Array 2 and put a fresh disk in. The controller duly noted the clean disk, so I asked it to rebuild Array 2. At this point the OS crashed and I was left with: Array 0 (R5), Array 1 (R1) and Array 2 (R5). How it got this config. I don’t know; but it did, and also started complaining that the arrays were (naturally) severly degraded and that some of the disks were even offline completely. I’ll update this post soon, but at the moment these controllers aren’t looking good.

