[Beowulf] Contents of Compute Nodes Images vs. Login Node Images

Tue Oct 23 23:34:34 PDT 2018

On Wednesday, 24 October 2018 3:15:51 AM AEDT Ryan Novosielski wrote:

> I realize this may not apply to all cluster setups, but I’m curious what
> other sites do with regard to software (specifically distribution packages,
> not a shared software tree that might be remote mounted) for their login
> nodes vs. their compute nodes.

At VLSCI we had separate xCAT package lists for both, but basically the login 
node was a superset of the compute node list.  These built RAMdisk images so 
keeping them lean (on top of what xCAT automatically strips out for you) was 
important.

Here at Swinburne we run the same image on both, but that's a root filesystem 
chroot on Lustre so size doesn't impact memory usage  (the node boots a 
patched oneSIS RAMdisk that brings up OPA and mounts Lustre then pivots over 
onto the image there for the rest of the boot).  The kernel has a patched 
overlayfs2 module that does clever things for that part of the tree to avoid 
constantly stat()ing Lustre for things it has already cached (IIRC, that's a 
colleagues code).

We install things into the master for the chroot (tracked with git) then have 
a script that turns the cache mode off across the cluster, rsync's things into 
the actual chroot area, does a drop_caches and then turns the cache mode on 
again.

Hope that helps!
Chris
-- 
 Chris Samuel  :  http://www.csamuel.org/  :  Melbourne, VIC