[Beowulf] cluster deployment and config management
Rémy Dernat
remy.dernat at umontpellier.fr
Tue Sep 5 04:52:50 PDT 2017
Hi,
Le 05/09/2017 à 08:57, Carsten Aulbert a écrit :
> Hi
>
> On 09/05/17 08:43, Stu Midgley wrote:
>> Interesting. Ansible has come up a few times.
>>
>> Our largest cluster is 2000 KNL nodes and we are looking towards 10k...
>> so it needs to scale well :)
>>
> We went with ansible at the end of 2015 until we hit a road block with
> it not using a client daemon a fat ferew months. When having a few 1000
> states to perform on each client, the lag for initiating the next state
> centrally from the server was quite noticeable - in the end a single run
> took more than half an hour without any changes (for a single host!).
>
> After that we re-evaluated with salt stack being the outcome scaling
> well enough for our O(2500) clients.
+1 for SaltStack here. It really performs very well on large
infrastructure (from doc.
https://docs.saltstack.com/en/latest/topics/tutorials/intro_scale.html )
and allows complex rules with reactors and orchestrators (including some
ways to manage post-reboot/connections).
There is also a github project which allows to deploy a cluster from
scratch with SaltStack, on a CentOS base, with PXE, dhcp, dns,
kickstart... :
https://github.com/oxedions/banquise/
Personnally, I will use (it works, but it needs some additionnal tests)
SaltStack with FAI ( https://fai-project.org/ ) to deploy my nodes. Or
maybe, I will switch to banquise, but for now, this project is still a
bit too young and I need a debian base OS (but I know it is planned;
waiting for the preseed config management through Salt). I am using
gitfs as a SaltStack backend and I also have some configs files as
another git repository (eg : environment modules files).
Best regards,
Rémy.
>
> Note, I ave not tracked if and how ansible progressed over the past
> ~2yrs which may or may not exhibit the same problems today.
>
> Cheers
>
> Carsten
>
More information about the Beowulf
mailing list