[Beowulf] Fault tolerance & scaling up clusters (was Re: Bright Cluster Manager)
Roland Fehrenbacher
rf at q-leap.de
Sat May 19 01:46:44 PDT 2018
>>>>> "J" == Lux, Jim (337K) <james.p.lux at jpl.nasa.gov> writes:
J> On May 17, 2018, at 06:01, Roland Fehrenbacher <rf at q-leap.de>
J> wrote:
>>>>>>> "J" == Lux, Jim (337K) <james.p.lux at jpl.nasa.gov> writes:
>>
J> The reason I hadn't looked at "diskless boot from a server" is
J> the size of the image - assume you don't have a high bandwidth or
J> reliable link.
>>
>> This is not something to worry about with Qlustar. A (compressed)
>> Qlustar 10.0 image containing e.g. the core OS + slurm + OFED +
>> Lustre is just a mere 165MB to be transferred (eating 420MB of
>> RAM
J> 165 MB = 1.3 Gbit At 64 kbps that's about 6 hrs.
Ouch. Sure, with 64 kbps you've had it. Wouldn't have expected that kind
of throughput at NASA in 2018, or are these compute nodes in space that
you want to boot from a head-node in Houston :)
More information about the Beowulf
mailing list