Node cloning
Robert G. Brown
rgb at phy.duke.edu
Thu Apr 5 15:47:46 PDT 2001
On Thu, 5 Apr 2001, Giovanni Scalmani wrote:
>
> Hi!
>
> On Thu, 5 Apr 2001, Oscar Roberto [iso-8859-1] López Bonilla wrote:
>
> > And then use the command (this will take long, so you can do it overnight)
> > cp /dev/hda /dev/hdb ; cp /dev/hda /dev/hdc ; cp /dev/hda /dev/hdd
>
> I also did this way for my cluster, BUT I've experienced instability
> for some nodes (3/4 over 20). My guess was that "cp /dev/hda /dev/hdb"
> copied also the bad-blocks list of hda onto hdb and this looks wrong
> to me. So I partitioned and made the filesystems on each node and then
> cloned the content of each filesystem. Those nodes are now stable.
>
> A question to the 'cp gurus' out there: is my guess correct about
> the bad blocks list?
One of many possible problems, actually. This approach to cloning
makes me shudder -- things like the devices in /dev generally have to
built, not copied, there are issues with the boot blocks and bad block
lists and the bad blocks themselves on both target and host. raw
devices are dangerous things to use as if they were flatfiles.
Tarpipes (with tar configured the same way it would be for a
backup|restore but writing/reading stdout) are a much safer way to
proceed. Or dump/restore pipes on systems that have it -- either one is
equivalent to making a backup and restoring it onto the target disk.
One reason I gave up cloning (after investing many months writing a
first generation cloning tool for nodes (which booted a diskless
configuration, formatted a local disk, and cloned itself onto the local
disk) and started a second generation GUI-driven one) was that just
cloning isn't enough. There is all sorts of stuff that needs to be done
to the clones to give them a unique identity (even something as simple
as their own ssh keys), one needs to rerun lilo, it requires that you
keep one "pristine" host to use as the master to clone or you have the
very host configuration creep you set out to avoid. Either way you end
up inevitably having to upgrade all the nodes or install security or
functionality updates.
These days there are just better ways (in my opinion) to proceed if your
goal is simple installation and easy upgrade/update and low maintenance.
Cloning is also very nearly an irreversible decision -- if you adopt
clone methods it can get increasingly difficult to maintain your cluster
without ALSO developing tools and methods that could just as easily have
been used to install and clean things up post-install.
Even so, if you are going to clone, I think that the diskless->local
clone is a very good way to proceed, because it facilitates
reinstallation and emergency operation of a node even if a hard disk
crashes (you can run it diskless while getting a replacement). It does
require either a floppy drive ($15) or a PXE chip, but this is a pretty
trivial investment per node.
rgb
--
Robert G. Brown http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
More information about the Beowulf
mailing list