[Beowulf] Green Cluster?
Lombard, David N
dnlombar at ichips.intel.com
Wed Jul 23 14:43:55 PDT 2008
On Sat, Jul 19, 2008 at 01:40:59PM -0700, fkruggel at uci.edu wrote:
> Thanks for your suggestions. Let me be more specific.
> I would like to have nodes automatically wake up when
> needed and go to sleep when idle for some time. My
> ganglia logs tell me that there is considerable idle
> time on our cluster. The issue is that I would like to
> have the cluster adapt *automatically* to the load,
> without interaction of an administrator.
Sounds like a plan...
> Here is how far I got:
> I can set a node to sleep (suspend-to-ram) using ACPI.
> But for powering on, I have to press the power button.
> No automatic solution.
...
> Is it possible to wake up a node over lan (without reboot)?
It depends. (Did you actually expect a different answer?)
Setting the wakeup events *may* help. What does /proc/acpi/wakeup
show? Here's an example from a D975PBZ running F7's 2.6.23:
Device S-state Status Sysfs node
TANA S4 disabled pci:0000:02:01.0
P0P3 S4 disabled pci:0000:00:1e.0
AC97 S4 disabled
USB0 S3 disabled pci:0000:00:1d.0
USB1 S3 disabled pci:0000:00:1d.1
USB2 S3 disabled pci:0000:00:1d.2
USB3 S3 disabled pci:0000:00:1d.3
USB7 S3 disabled pci:0000:00:1d.7
UAR1 S4 disabled pnp:00:07
SLPB S4 *enabled
Note, only SLPB (sleep button) is enabled by default on this system.
NB:
- the "TANA" device on *this* system is the NIC
- setting wol via ethtool doesn't affect the above.
And here's a old Dell Inspiron running kernel.org's 2.6.23.8:
# cat /proc/acpi/wakeup
Device S-state Status Sysfs node
LID S3 *enabled
PBTN S4 *enabled
PCI0 S3 disabled no-bus:pci0000:00
UAR1 S3 disabled pnp:00:0d
MPCI S3 disabled
Where both the lid (LID) and power (PBTN) buttons are enabled by default.
Also note the maximum ACPI sleep levels whence the wakeup will work.
If you need to enable a device, use
# echo _device_ enable > /proc/acpi/wakeup
where _device_ is the name listed in /proc/acpi/wakeup
Here's the Dell responding to a lid close in a very very minimal system
(kernel, busybox, uClibc):
# Stopping tasks ... done.
Suspending console(s)
Opening the lid produces this after about 6 seconds:
pnp: Device 00:0d disabled.
ACPI: PCI Interrupt 0000:00:03.0[A] -> Link [LNKD] -> GSI 11 (level, low) -> IR1
ACPI: PCI Interrupt 0000:00:03.1[A] -> Link [LNKD] -> GSI 11 (level, low) -> IR1
pnp: Device 00:0d activated.
Restarting tasks ... done.
#
> How can I detect that a node was idle for some specific time?
This all really needs to be run from the RM (resource manager). The RM
can know when a job ends on a node and that a node will or will not be
free in the future. The RM can also manage the scheduler to avoid bringing
sleeping nodes up until they're actually needed--a SMOP left as an exercise
to the reader ;)
I *think* Moab may do some of this stuff already.
--
David N. Lombard, Intel, Irvine, CA
I do not speak for Intel Corporation; all comments are strictly my own.
More information about the Beowulf
mailing list