[Beowulf] Problems with Dell M620 and CPU power throttling

Aaron Knister knistera at ncbi.nlm.nih.gov
Thu Feb 6 06:30:09 PST 2014


Bill Wichser <bill <at> princeton.edu> writes:

> 
> We have tested using c1 instead of c0 but no difference.  We don't use 
> logical processors at all.  When the problems happens, it doesn't matter 
> what you set the cores for C1/C0, they never get up to speed again 
> without a power cycle/reseat.  We believe this to be something related 
> to power.  Maybe current limiting.
> 
> As I stated yesterday, after a complete chassis power cycle on Tuesday 
> Sept 10, the entire 37 chassis have been outperforming their 2.6GHz 
> ratings flawlessly.  I don't know if this is going to be the solution we 
> have been searching to find but it has certainly been a week and a half 
> of some very happy researchers!
> 
> Thanks,
> Bill
> 
> On 09/19/2013 11:32 AM, Christopher Samuel wrote:
> > -----BEGIN PGP SIGNED MESSAGE-----
> > Hash: SHA1
> >
> > On 18/09/13 10:49, Douglas O'Flaherty wrote:
> >
> >> "Run in C1. C0 over commits unpredictably, then throttles."
> > I've seen a recommendation in a public Mellanox document of using C1
> > not C0 when using hyperthreading/SMT, could be related to this..
> >
> > - -- 
> >   Christopher Samuel        Senior Systems Administrator
> >   VLSCI - Victorian Life Sciences Computation Initiative
> >   Email: samuel <at> unimelb.edu.au Phone: +61 (0)3 903 55545
> >   http://www.vlsci.org.au/      http://twitter.com/vlsci
> >
> > -----BEGIN PGP SIGNATURE-----
> > Version: GnuPG v1.4.12 (GNU/Linux)
> > Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/
> >
> > iEYEARECAAYFAlI7GPAACgkQO2KABBYQAh9zPQCfeOCdUupjqx7SDeFxQjBWG9NU
> > FL4AnRYA3zLCNzEVNp0ypiW9KMYp3ohW
> > =ntfO
> > -----END PGP SIGNATURE-----
> > _______________________________________________
> > Beowulf mailing list, Beowulf <at> beowulf.org sponsored by Penguin
Computing
> > To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
> 

Hi Bill,

I'm wondering if this issue has resurfaced for you after the firmware
updates and chassis power cycles?

I'm having what sounds to be the same issue but with R320's. So far
BIOS/iDRAC/Lifecycle controller updates haven't helped but I haven't tried
physically removing power to the node. I have been doing using the "ipmitool
power cycle" command to reboot the nodes and get them out of their funk
(running at 0.2GHz) but that, of course, still leaves part of the chassis
energized.

Thanks!

-Aaron





More information about the Beowulf mailing list