[Beowulf] standards for GFLOPS / power consumption measurement?

Robert G. Brown rgb at phy.duke.edu
Tue May 10 11:08:25 PDT 2005


On Tue, 10 May 2005, Timothy Bole wrote:

> this seems to me, at least, to be a bit of an unfair comparison.  if
> someone were to just give me a cluster with 80386 processors, then i would
> tie for the lead forever, as 0/{any number>0}=0. {not counting if someone
> were to *pay* me to take said cluster of 80386's}...

Actually there is an lower bound determined by Moore's Law and the cost
of baseline infrastructure.

Let us suppose that it costs some amount to run a node for a year, more
or less independent of what kind of node it is.  This cost is actually
NOT flat -- older nodes cost more in human time and parts and newer
nodes may use more power and AC, human management and administration
costs vary somewhat -- but the modifications for specific actual systems
are obvious and can be made in any real cost benefit analysis.  So let's
make it flat, and furthermore assume that infrastructure costs I = 100
GP ("gold pieces", a unit of money in World of Warcraft, to avoid
getting too specific).

Let us further assume that it costs us some multiple of this amount to
buy a brand, spanking new node, and that furthermore we will choose to
compare apples to apples -- both the older nodes and newer nodes are
"the same" as far as being rackmount vs shelfmount, single vs dual
processor, and that both fortuitously have enough memory to run a
satisfactory operating system and our favorite task (not likely to be
true for older systems), enough disk and network capacity, etc, not
because these other dimensions aren't ALSO important and potential show
stoppers but because comparing raw CPU is enough.  Again, modifying for
other bottlenecks or critical resources can be done although it gets
less and less straightforward as one introduces additional complexity
into the CBA.  To be specific, we'll assign a cost of N*I = 1000 GP for
a brand new node.

Finally, let us assume that Moore's Law holds and on average decribes
the evolution of raw CPU speed on the task of interest with a doubling
time of 18 months.

It is then straightforward to compute under these assumptions the
break-even cost of older hardware.  For example, three year old systems
are four times slower (so we need to buy four of them to equal one new
system in terms of potential work throughput).  They also costs us four
hits of I for infrastructure vs one hit of I for a single new system.
The total cost of a new system is 1000 GP + 100 GP per year of
operational life.  If we only consider a single year of expected
operational life, this totals 1100 GP.

To find out the break even point on the three year old systems, we
subtract their single-year infrastructure cost and divide the result by
four:

  1100 - 400 = 700/4 = 175 GP

If our estimate for I was accurate and all other things are equal, we
break even if we buy the four 3 year old systems for 175 GP expecting to
use them for only a year.

If we plan to use them for two years:

  1200 - 800 = 400/4 = 100 GP

and we can spend at most 100 GP, although our assumption that the 3 year
old systems will function out to year five starts getting a bit hairy.

If we plan to use them for three years:

  1300 - 1200 = 100/4 = 25 GP

and somebody would pretty much have to give the systems to you in order
for you to break even.  At this point the probability that four systems
obtained in year three will survive to year six without additional costs
is almost zero.

Clearly there is an absolute break even point even for a single year of
operation.  In this example, when a brand new system is 11 times faster,
or 1.5*log_2(11) = 5.2 years old, if someone GIVES you the systems to
run for a single year, if all of the simplifying assumptions are
correct, if all eleven five year old systems survive the year without a
maintenance incident, then you break even.

Now, this is all upper bound stuff.  In reality the boundaries for break
even are much closer than five years -- this is simply the theoretical
boundary for this particular set of assumptions.  I personally wouldn't
accept four year old systems as a gift.  Three year old systems MIGHT be
worth accepting to run for a year in production, or longer in a starter,
home, or educational cluster (where performance/production per se isn't
the point), although many a systems person I know wouldn't accept
anything older than two years old unless it was 2+ year old bleeding
edge (when new) hardware and so had some sort of performance boost
relative to the assumptions above.

These numbers aren't terribly unrealistic, except that allowing for
Amdahl's law and nonlinear costs associated with the exponential
increase in probability of maintenance and difficulty getting
replacement parts and the human energy required to squeeze modern linux
onto an older systems and the difficulties one will have with networking
and the space they take up and a fairer ratio of infrastructure cost to
new system cost will all make you arrive at the conclusion that to
REALLY optimize TCO and cost/benefit your cluster should almost
certainly be rollover replaced every two to three years, four at the
absolute outside.

As for 386's -- a single 2.4 GHz AMD 64 box purchased for $500 is
roughly 20 years ahead of the 386.  In raw numbers that is ballpark of
10,000 times faster (according to Moore's Law).  In raw clock it is
about about 1000x faster.  Then there are four (486, pentium, P6, 64
bit) processor generations in between, each contributing close to a
factor of two relative to clock for a factor of 16 more or 16,000.

It costs more for a systems person to LOOK AT a 386, let alone actually
set it up or try to get anything at all to run on it than that system
can contribute to actual production, compared to absolutely anything one
can buy new today.  The same is true of all 80486's, all Pentiums, all
Pentium Pro's, all PII's and PIII's, most Celerons (with a possible
exception for brand new ones at the highest current clock, although I
personally think the AMD 64 kills the venerable Celery dead).

Brutal, sure, but it's just the way of the world given exponential
growth in benefit at constant cost.

This same sort of analysis can be extended to non-HPC TCO CBAs as well
(although it is SO rare to see it done).  In for example a departmental
or corporate LAN the issue is complicated by the complexity of the
application space and productivity benefits associated with upgrades,
which range from nil for people who dominantly use only e.g. office type
applications and web browsers to high for people who actually "use"
their computer's full capacity in some dimension to accomplish useful
work.  The scaling of maintenance and infrastructure is also frequently
dominated as much by human issues (support, training, and so on) as it
is by hardware, in contrast to much of the HPC market.  So a much more
informed and careful strategy is needed to optimize cost benefit and
plan for rollover replacement.  Alas, most organizations just can't
conceptually manage this degree of complexity and opt instead for a
simplistic/flat "fair" policy that ends up being a wasteful and stupid
way to allocate scarce resources nearly all of the time but which
concentrates power in the hands of an entrenched bureaucracy and which
reduces the need for human intelligence to support operations to near
zero.

   rgb

> 
> having inhabited many an underfunded academic department, i have seen that
> there are many places where there is just not money to throw at any
> research labs, including computational facilities.  i think that the point
> of the article was to demonstrate that one can build a useful beowulf for
> a dollar amount that is not unreasonable to find at small companies and
> universities.  not everyone can count on the generosity of strangers
> handing out network cards and hubs.  so, the US$/GFLOP is a decent, but
> *very* generic, means of getting the most of that generic dollar.
> 
> of course, the bottom line is that a cost benefit analysis for any cluster
> is really necessary, and the typical type of problem to be run on said
> cluster should factor into this.  i applaud the work of the KRONOS team
> for demonstrating the proof-of-principle that one can design and build a
> useful beowulf for US$2500.
> 
> cheers,
> twb
> 
> 
> On Tue, 10 May 2005, Vincent Diepeveen wrote:
> 
> > How do you categorize second hand bought systems?
> >
> > I bought for 325 euro a third dual k7 mainboard + 2 processors.
> >
> > The rest i removed from old machines that get thrown away otherwise.
> > Like 8GB harddisk. Amazingly biggest problem was getting a case to reduce
> > sound production :)
> >
> > Network cards i got for free, very nice gesture from someone.
> >
> > So when speaking of gflops per dollar at linpack, this will destroy of
> > course any record of $2500 currently, especially for applications needing
> > bandwidth to other processors, if i see what i paid for this self
> > constructed beowulf.
> >
> > At 05:19 PM 5/9/2005 -0400, Douglas Eadline - ClusterWorld Magazine wrote:
> > >On Thu, 5 May 2005, Ted Matsumura wrote:
> > >
> > >> I've noted that the orionmulti web site specifies 230 Gflops peak, 110
> > >> sustained, ~48% of peak with Linpack which works out to ~$909 / Gflop ?
> > >>  The Clusterworld value box with 8 Sempron 2500s specifies a peak Gflops
> > by
> > >> measuring CPU Ghz x 2 (1 - FADD, 1 FMUL), and comes out with a rating of
> > 52%
> > >> of peak using HPL @ ~ $140/Gflop (sustained?)
> > >
> > >It is hard to compare. I don't know what sustained or peak means in the
> > >context of their tests. There is the actual number (which I assume is
> > >sustained) then the theoretical peak (which I assume is peak).
> > >
> > >And our cost/Gflop does not take into consideration the construction
> > >cost. In my opinion when reporting these type of numbers, there
> > >should be two categories "DIY/self assembled" and "turn-key". Clearly
> > >Kronos is DIY system and will always have an advantage of a
> > >turnkey system.
> > >
> > >
> > >>  So what would the orionmulti measure out with HPL? What would the
> > >> Clusterworld value box measure out with Linpack?
> > >
> > >Other benchmarks are here (including some NAS runs):
> > >
> > >http://www.clusterworld.com/kronos/bps-logs/
> >
> > >
> >
> > >
> > >>  Another line item spec I don't get is rocketcalc's (
> > >> http://www.rocketcalc.com/saturn_he.pdf )"Max Average Load" ?? What does
> > >> this mean?? How do I replicate "Max Average Load" on other systems??
> > >>  I'm curious if one couldn't slightly up the budget for the clusterworld
> > box
> > >> to use higher speed procs or maybe dual procs per node and see some
> > >> interesting value with regards to low $$/Gflop?? Also, the clusterworld
> > box
> > >> doesn't include the cost of the "found" utility rack, but does include the
> > >> cost of the plastic node boxes. What's up with that??
> > >
> > >This was explained in the article. We assumed that shelving was optional
> > >because others my wish to just put the cluster on existing shelves or
> > >table top (or with enough Velcro strips and wire ties build a standalone
> > >cube!)
> > >
> > >Doug
> > >>
> > >
> > >----------------------------------------------------------------
> > >Editor-in-chief                   ClusterWorld Magazine
> > >Desk: 610.865.6061
> > >Cell: 610.390.7765         Redefining High Performance Computing
> > >Fax:  610.865.6618                www.clusterworld.com
> > >
> > >_______________________________________________
> > >Beowulf mailing list, Beowulf at beowulf.org
> > >To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> > >
> > >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >
> 
> =========================================================================
> Timothy W. Bole a.k.a valencequark
> Graduate Student
> Department of Physics
> Theoretical and Computational Condensed Matter
> UMBC
> 4104551924
> reply-to: valencequark at umbc.edu
> 
> http://www.beowulf.org
> =========================================================================
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

-- 
Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list