[Beowulf] cluster deployment and config management
Bill Broadley
bill at cse.ucdavis.edu
Wed Sep 6 02:46:19 PDT 2017
On 09/05/2017 07:14 PM, Stu Midgley wrote:
> I'm not feeling much love for puppet.
I'm pretty fond of puppet for managing clusters. We use cobbler to go from PXE
boot -> installed, then puppet takes over.
Some of my favorite features:
* Inheritance is handy node -> node for a particular cluster -> compute node ->
head node
* Tags for handling users is handy, 1200 users, dozen clusters, and various
other bits of infrastructure makes it really easy to manage who gets access
to what.
* I like the self healing aspect, defining the system state, not how to get
there. That way if I need to repurpose, patch, or mistakenly make a node
unique in some way the next puppet run fixes it.
* Definitely helps with re-use across clusters. Makes for a higher incentive
to do it right the first time.
* Using facts to make decisions is really useful. Things like detecting if you
are a virtual machine, or updating autofs maps if IB is down.
>
> On Wed, Sep 6, 2017 at 7:51 AM, Christopher Samuel <samuel at unimelb.edu.au
> <mailto:samuel at unimelb.edu.au>> wrote:
>
> On 05/09/17 15:24, Stu Midgley wrote:
>
> > I am in the process of redeveloping our cluster deployment and config
> > management environment and wondered what others are doing?
>
> xCAT here for all HPC related infrastructure. Stateful installs for
> GPFS NSD servers and TSM servers, compute nodes are all statelite, so a
> immutable RAMdisk image is built on the management node for the compute
> cluster and then on boot they mount various items over NFS (including
> the GPFS state directory).
>
> Nothing like your scale, of course, but it works and we know if a node
> has booted a particular image it will be identical to any other node
> that's set to boot the same image.
>
> Healthcheck scripts mark nodes offline if they don't have the current
> production kernel and GPFS versions (and other checks too of course)
> plus Slurm's "scontrol reboot" lets us do rolling reboots without
> needing to spot when nodes have become idle.
>
> I've got to say I really prefer this to systems like Puppet, Salt, etc,
> where you need to go and tweak an image after installation.
>
> For our VM infrastructure (web servers, etc) we do use Salt for that. We
> used to use Puppet but we switched when the only person who understood
> it left. Don't miss it at all...
>
> cheers,
> Chris
> --
> Christopher Samuel Senior Systems Administrator
> Melbourne Bioinformatics - The University of Melbourne
> Email: samuel at unimelb.edu.au <mailto:samuel at unimelb.edu.au> Phone: +61 (0)3
> 903 55545 <tel:%2B61%20%280%293%20903%2055545>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org <mailto:Beowulf at beowulf.org>
> sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> <http://www.beowulf.org/mailman/listinfo/beowulf>
>
>
>
>
> --
> Dr Stuart Midgley
> sdm900 at gmail.com <mailto:sdm900 at gmail.com>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
More information about the Beowulf
mailing list