[Beowulf] RAID5 rebuild, remount with write without reboot?
Joe Landman
joe.landman at gmail.com
Tue Sep 5 11:06:54 PDT 2017
On 09/05/2017 01:28 PM, mathog wrote:
> Short form:
>
> An 8 disk (all 2Tb SATA) RAID5 on an LSI MR-USAS2 SuperMicro
> controller (lspci shows " LSI Logic / Symbios Logic MegaRAID SAS 2008
> [Falcon]") system was long ago configured with a small partition of
> one disk as /boot and logical volumes for / (root) and /home on a
> single large virual drive on the RAID. Due to disk problems and a
> self goal (see below) the array went into a degraded=1 state (as
> reported by megacli) and write locked both root and home. When the
> failed disk was replaced and the rebuild completed those were both
> still write locked. "mount -a" didn't help in either case. A reboot
> brought them up normally but ideally that should not have been
> necessary. Is there a method to remount the logical volumes writable
> that does not require a reboot?
Generally the FW would write lock it. A
mount -o remount,rw $path
may not clear this. I've found that I need to often do something akin to
echo "- - -" > /sys/class/scsi_host/host0/scan
for each scsi host bus. Another thing to try is to remove the driver
and modprobe it again. However, as your /boot and / are on it, this
probably won't work well.
Reboot has this same effect though, so you did this sort of by default.
Regards,
Joe
>
> Long form:
>
> Periodic testing of the disks inside this array turned up pending
> sectors with
> this command:
>
> smartctl -a /dev/sda -d sat+megaraid,7
>
> A replacement disk was obtained and the usual replacement method applied:
>
> megacli -pdoffline -physdrv[64:7] -a0
> megacli -pdmarkmissing -physdrv[64:7] -a0
> megacli -pdprprmv -physdrv[64:7] -a0
> megacli -pdlocate -start -physdrv[64:7] -a0
>
> The disk with the flashing light was physically swapped. The smartctl
> was run again and unfortunately its values were unchanged. I had
> always assumed that the "7" in that smartctl was a physical slot,
> turns out that it is actually the "Device ID". In my defense the
> smartctl man page does a very poor job describing this:
>
> megaraid,N - [Linux only] the device consists of one or more
> SCSI/SAS disks
> connected to a MegaRAID controller. The non-negative integer N (in
> the range of 0 to 127 inclusive) denotes which disk on the controller
> is monitored. Use syntax such as:
>
> In this system, unlike the others I had worked on previously, Device
> ID and
> slots were not 1:1.
>
> Anyway, about a nanosecond after this was discovered the disk at
> Device ID 7 was marked as Failed by the controller whereas previously
> it had been "Online, Spun Up".
> Ugh. At that point the logical volumes were all set read only and the
> OS became barely usable, with commands like "more" no longer
> functioning. Megacli and sshd, thankfully, still worked. Figuring
> that I had nothing to lose the replacement disk was removed from slot
> 7 and the original, hopefully still good disk replaced. That put the
> system into this state.
>
> slot 4 (device ID 7) failed.
> slot 7 (device ID 5) is Offline.
>
> and
>
> megacli -PDOnline -physdrv[64:7] -a0
>
> put it at
>
> slot 4 (device ID 7) failed.
> slot 7 (device ID 5) Online, Spun Up
>
> The logical volumes were still read only but "more" and most other
> commands now worked again. Megacli still showed the "degraded" value
> as 1. I'm still not clear
> how the two "read only" states differed to cause this change.
>
> At that point the failed disk in slot 4 (not 7!) was replaced with the
> new disk (which had been briefly in slot 7) and it immediately began
> to rebuild. Something on the order of 48 hours later that rebuild
> completed, and the controller set "degraded" back to 0. However, the
> logical volumes were still readonly. "mount -a" didn't fix it, so the
> system was rebooted, which worked.
>
>
> We have two of these back up systems. They are supposed to have
> identical contents but do not. Fixing that is another item on a long
> todo list. RAID 6 would have been a better choice for this much
> storage, but it does not look like this card supports it:
>
> RAID0, RAID1, RAID5, RAID00, RAID10, RAID50, PRL 11, PRL 11 with
> spanning,
> SRL 3 supported, PRL11-RLQ0 DDF layout with no span,
> PRL11-RLQ0 DDF layout with span
>
> That rebuild is far too long for comfort. Had another disk failed in
> those two days that would have been it. Neither controller has battery
> backup, and the one in question is not even on a UPS, so a power
> glitch could be fatal too. Not a happy thought while record SoCal
> temperatures persisted throughout the entire rebuild! The systems are
> in different buildings on the same campus, sharing the same power
> grid. There are no other backups for most of this data.
>
> Even though the controller shows this system as no longer degraded,
> should I believe that there was no data loss? I can run checksums on
> all the files (even though it will take forever) and compare the two
> systems. But as I said previously, the files were not entirely 1:1,
> so there are certainly going to be some files on this system which
> have no match on the other.
>
> Regards,
>
> David Mathog
> mathog at caltech.edu
> Manager, Sequence Analysis Facility, Biology Division, Caltech
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
--
Joe Landman
e: joe.landman at gmail.com
t: @hpcjoe
w: https://scalability.org
g: https://github.com/joelandman
l: https://www.linkedin.com/in/joelandman
More information about the Beowulf
mailing list