[Beowulf] Jeff Squayres MPI proposals
Christopher Samuel
samuel at unimelb.edu.au
Thu Mar 3 15:30:23 PST 2016
On 04/03/16 06:40, Douglas Eadline wrote:
> Yes, failure needs to be option.
The Slurm folks have been working on failure management support for a
little while, the idea being you can have a pool of spare nodes to pick
from (or alternatively bargain with a scheduler for a node that's
currently busy to come free later on and then add it to the job,
potentially extending the walltime to make up for the shortfall).
A better description from someone with higher caffeination is here:
http://slurm.schedmd.com/nonstop.html
All the best,
Chris
--
Christopher Samuel Senior Systems Administrator
VLSCI - Victorian Life Sciences Computation Initiative
Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
http://www.vlsci.org.au/ http://twitter.com/vlsci
More information about the Beowulf
mailing list