[Beowulf] RAID5 rebuild, remount with write without reboot?

Tue Sep 5 11:06:54 PDT 2017

On 09/05/2017 01:28 PM, mathog wrote:
> Short form:
>
> An 8 disk (all 2Tb SATA) RAID5 on an LSI MR-USAS2 SuperMicro 
> controller (lspci shows " LSI Logic / Symbios Logic MegaRAID SAS 2008 
> [Falcon]") system was long ago configured with a small partition of 
> one disk as /boot and logical volumes for / (root) and /home on a 
> single large virual drive on the RAID.  Due to disk problems and a 
> self goal (see below) the array went into a degraded=1 state (as 
> reported by megacli) and write locked both root and home.  When the 
> failed disk was replaced and the rebuild completed those were both 
> still write locked.  "mount -a" didn't help in either case.  A reboot 
> brought them up normally but ideally that should not have been 
> necessary.  Is there a method to remount the logical volumes writable 
> that does not require a reboot?

Generally the FW would write lock it.  A

     mount -o remount,rw $path

may not clear this.  I've found that I need to often do something akin to

     echo "- - -" > /sys/class/scsi_host/host0/scan

for each scsi host bus.  Another thing to try is to remove the driver 
and modprobe it again.  However, as your /boot and / are on it, this 
probably won't work well.

Reboot has this same effect though, so you did this sort of by default.

Regards,

Joe

>
> Long form:
>
> Periodic testing of the disks inside this array turned up pending 
> sectors with
> this command:
>
>    smartctl -a  /dev/sda -d sat+megaraid,7
>
> A replacement disk was obtained and the usual replacement method applied:
>
> megacli -pdoffline -physdrv[64:7] -a0
> megacli -pdmarkmissing -physdrv[64:7] -a0
> megacli -pdprprmv -physdrv[64:7] -a0
> megacli -pdlocate -start -physdrv[64:7] -a0
>
> The disk with the flashing light was physically swapped.  The smartctl 
> was run again and unfortunately its values were unchanged.  I had 
> always assumed that the "7" in that smartctl was a physical slot, 
> turns out that it is actually the "Device ID". In my defense the 
> smartctl man page does a very poor job describing this:
>
>   megaraid,N - [Linux only] the device consists of one or more 
> SCSI/SAS disks
>   connected to  a  MegaRAID controller.   The  non-negative integer N (in
>   the range of 0 to 127 inclusive) denotes which disk on the controller
>   is monitored.  Use syntax such as:
>
> In this system, unlike the others I had worked on previously, Device 
> ID and
> slots were not 1:1.
>
> Anyway, about a nanosecond after this was discovered the disk at 
> Device ID 7 was marked as Failed by the controller whereas previously 
> it had been "Online, Spun Up".
> Ugh. At that point the logical volumes were all set read only and the 
> OS became barely usable, with commands like "more" no longer 
> functioning. Megacli and sshd, thankfully, still worked.  Figuring 
> that I had nothing to lose the replacement disk was removed from slot 
> 7 and the original, hopefully still good disk replaced.  That put the 
> system into this state.
>
> slot 4 (device ID 7) failed.
> slot 7 (device ID 5) is Offline.
>
> and
>
> megacli -PDOnline -physdrv[64:7] -a0
>
> put it at
>
> slot 4 (device ID 7) failed.
> slot 7 (device ID 5) Online, Spun Up
>
> The logical volumes were still read only but "more" and most other 
> commands now worked again.  Megacli still showed the "degraded" value 
> as 1.  I'm still not clear
> how the two "read only" states differed to cause this change.
>
> At that point the failed disk in slot 4 (not 7!) was replaced with the
> new disk (which had been briefly in slot 7) and it immediately began 
> to rebuild.  Something on the order of 48 hours later that rebuild 
> completed, and the controller set "degraded" back to 0. However, the 
> logical volumes were still readonly.  "mount -a" didn't fix it, so the 
> system was rebooted, which worked.
>
>
> We have two of these back up systems.  They are supposed to have 
> identical contents but do not.  Fixing that is another item on a long 
> todo list.  RAID 6 would have been a better choice for this much 
> storage, but it does not look like this card supports it:
>
>   RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, PRL 11, PRL 11 with 
> spanning,
>   SRL 3 supported, PRL11-RLQ0 DDF layout with no span,
>   PRL11-RLQ0 DDF layout with span
>
> That rebuild is far too long for comfort.  Had another disk failed in 
> those two days that would have been it. Neither controller has battery 
> backup, and the one in question is not even on a UPS, so a power 
> glitch could be fatal too. Not a happy thought while record SoCal 
> temperatures persisted throughout the entire rebuild! The systems are 
> in different buildings on the same campus, sharing the same power 
> grid.  There are no other backups for most of this data.
>
> Even though the controller shows this system as no longer degraded, 
> should I believe that there was no data loss?  I can run checksums on 
> all the files (even though it will take forever) and compare the two 
> systems.  But as I said previously, the files were not entirely 1:1, 
> so there are certainly going to be some files on this system which 
> have no match on the other.
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
Joe Landman
e: joe.landman at gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman