[Beowulf] Re: MS Cray
Robert G. Brown
rgb at phy.duke.edu
Fri Sep 19 04:13:58 PDT 2008
On Thu, 18 Sep 2008, Gus Correa wrote:
> "we've tried to lower the bar in terms of the talent required to deploy one
> of these at a customer site"
But this is always the point of a turnkey vs home-engineered system. Do
it yourself is cheaper, but you have to do it yourself. This means you
have to be CAPABLE of doing it yourself.
To my direct experience, the humans on this planet self-partition into
two groups. The partitioning isn't strictly on talent or intelligence,
although both can be factors, but on something else, an element of the
human psyche. One group can do "it" themselves and, truth be told,
PREFERS to do it themselves (for nearly any value of it). They may not
have the talent to do it when they start, but they do when they are
done within the natural limits of their God-given intelligence,
strength, wisdom, and number of hit points.
The other, simply put, does not. They do it themselves in the natural
regime of it imposed on them by a cruel nature that won't feed them or
provide them with shelter unless they do it, and bitterly resent both it
and any other it they are ever forced, kicking and screaming, to attempt
and half-assedly master. They are ever willing to let other people do
it on their behalf, and will often pay large sums of money not to have
to do it.
This partitioning extends through life from young to old, across
national and cultural boundaries, across social strata from top to
bottom (in spite of the fact that one would expect doing it to provide
social or economic benefits -- and liking to do it does -- there are
plenty of blue-collar workers who passionately try to do it within their
spectrum of talents and opportunities just as there are plenty of
lazy-ass white collar workers who do it only if it involves oddly shaped
sticks and a small white ball or telling other people to do it (the one
it people in the latter category often feel uniquely qualified to do at
all levels and values of it).
So (to go back on topic after that deep philosophical observation to a
list composed almost entirely of people who love doing it:-) lowering
the bar actually doesn't really help. Anyone actually capable of doing
it (where "it" = "program a serious parallel application involving MPI
and a pile of computers") doesn't need the bar lowered. They might go
turnkey to save time, especially when they are lavishly funded so that
they aren't forced to do it to save money to be able to afford larger
resources and actually finish faster. They might go turnkey because
they are already doing it for values of it such as "doing immense
amounts of laboratory research on genetics" so that they simply don't
have the time to mess with it -- where "it" in the latter case almost
certainly includes using a cluster to develop applications, so that
they'd also just purchase/install free turnkey software to do the
required genomics.
Nearly everyone else would RATHER do it because it is fun, because it is
satisfying, because they can get 40-60% more hardware capacity with the
money they save, because doing it themselves they learn far more in the
process and end up actually able to use the resource that they've built
at its full capacity instead of just knowing it as a sort of black box
front ended by a Visual screen that obscures all of the all-important
detail that might have one day taught them what the hell they're doing
with it so that they could do it well.
I'm not making it up. The lesson that Cray et. al. (and now Microsoft
too, jumping on the bandwagon so to speak) have apparently still failed
to learn is that the REASON that fifteen years ago nearly all the
supercomputers in the world were big iron fronted by expensive,
proprietary operating systems and custom and highly non-portable
compilers and programming tools (often purchased for millions of dollars
from Cray) where today these have all but disappeared from the world is
that the people that actually use HPC are overwhelmingly in the category
that can do it, that likes doing it, that know that doing it themselves
gives them a degree of ownership and control and knowledge that is
utterly inaccessible to those who shy away from doing it and want it
done for them, especially if they can spend other people's money to get
it without mental engagement on their part. The latter people rarely
succeed in doing it. Darwin slowly but surely acts against them in the
highly competitive research environment.
The end result is that "successful" turnkey vendors in HPC (as opposed
to HA or "server farms" aren't black-box providers, they are integrators
and educators like Joe. Joe (correct me if I'm wrong, Joe) doesn't JUST
engineer a system to meet some customer's requirements, plug it in, and
leave -- he might engineer it and build it, but he builds it WITH the
client, teaching the client what it's all about, working with the client
to the point where the client in the end has full ownership and has
learned what they need to learn to use the tool and perhaps take care of
it. Scyld/Penguin provides much the same sort of thing. Bottom line:
you don't do HPC if you aren't capable and naturally inclined to do it
yourself, and the top 500 list has over fifteen years come to
overwhelmingly -- overwhelmingly -- reflect that simple fact. This is a
true dynamic phase transition and not particularly likely to suddenly
revert to the earlier phase just because Cray and Microsoft try to get
the research world to regress to a state where customers pay THEM most
of the money they'd otherwise spend getting more processors. If they
were likely to do that now, they never would have stopped then instead
of running like bunnies away from Crays and towards commodity clusters.
> However, my feeling is that teaching 101 courses in Unix/Linux proficiency
> (directory tree, command redirection, etc), Unix programming environment
> (make, etc),
> and Unix tools (vi or emacs, sed, awk, basic shell scripting, etc),
> for science and engineering students would increase the "talent required"
> at a much lower cost and with much higher benefits than buying Windows
> deskside supercomputers.
> Who would need a Windows based HPC then?
It's far past that. Who in the world is CAPABLE of programming a
parallel HPC who doesn't already know all of these things? Does
Microsoft seriously think that there exist fifty competent programmers
in the world that can write brilliant HPC-class programs using only
Microsoft compilers and tools but who have no idea what Unix is and how
it works and how to program in a Unixoid programming environment? Not
even their OWN programmers are that ignorant -- or rather, where they
are that ignorant they give us (gulp) >>Vista<< as an example of
superbly optimized, unbloated high performance code. I can see it now:
"Trust in the people that brought you Vista to provide you with high
performance tools that will produce the fastest code that will run on
the leanest resources."
Right. Like the 4 GB needed just to BOOT THE OPERATING SYSTEM of their
latest "desktop" version of Windows so that it doesn't run like a pig.
> To entice the freshmen students, RGB could give a few cool special lectures
> about Turing machines,
> cellular automata, etc, on these 101 classes.
> This would pay off much better than teaching C and C++ (say, with Visual
> Studio) to freshmen,
> would give them a background to use Unix/Linux machines effectively,
> and would prepare them to go beyond Matlab and Windows.
> Programming languages could be 201 or higher level courses.
I don't know offhand of any University that teaches Visual Studio to
freshmen as a means to learn C or C++, period. Damn few schools teach
C++ to freshmen, and nobody nowhere teaches C. Duke certainly doesn't,
and Duke and Microsoft are practically family. Community colleges might
teach it, and there might be courses at some level that teach it, but
generally speaking computer science departments -- once they get past
the intro courses in java -- use and teach Unixoid operating systems
(overwhelmingly Linux, but still quite a lot of e.g. Solaris) and
programming tools such as C, C++ (and rarely Fortran), often at the same
time.
After all, what can you learn from a closed source broken piece of, um,
"black box" like Windows (any flavor, any time)? How to fix a broken
registry? And what operating system, exactly, is written entirely in
C++ using a visual toolset? Does any professor WANT their students to
learn how IT "works", even if they could know themselves well enough to
teach it?
The difficulty of teaching or learning Unix/C programming tools and
methods is greatly exagerrated. I already teach this -- I'm teaching it
to one student this semester who has to learn Linux and C just to work
on a neural network project; I taught a student C from scratch last
semester (and the student is this semester taking an advanced course in
operating systems and programming and writing me lavish thank you notes
as he is blowing away the course, way ahead of his classmates).
I just loan them (and have them buy) a handful of books like Kernighan
and Pike and Glass and Ables (the latter a much more modern and useful
version of the venerable, nay classic, former). I teach them to use
jove: "Hey, bring up an xterm and run 'teachjove'. Check back with me
in a couple of days." I give them or show them a few free online or PDF
C books plus teach them about the man pages. I show them how to run a
Makefile from inside jove (Ctrl-X Ctrl-E, fingers stay right on those
keys where they do all of their work). I give them a C program template
(available on my website) to make it easy to get started with a
functioning ready-to-build example. I have them write three or four
very simple programs STARTING from that template to kind of get their
feet wet. I show them how to use SVN and threaten them with the hell of
utter failure if they don't use it and somehow lose their work by
overwriting their non-backed up sources. I deconstruct their programs,
whack them upside the head with a manual whenever the ratio of comment
lines to code drops much below 1:1, tell them that unindented code is
the invention of the Devil intended to lead poor lost souls into
purgatory, that good variable names and function call names aren't too
long, aren't too short, and DontUseSillyCaPS unless they mean something.
I teach them something that Microsoft-based tool users never really
figure out (witness the earlier posting from the Microsoft person) --
there is this thing called "ASCII" which compilers like, which produces
perfectly readable straight text, and which is PORTABLE across viewing
and editing environments, proprietary and non-proprietary. Good
programmers consequently nearly invariably use a straight ASCII text
editor, and eschew the use of the 8th bit and/or multibyte characters
unless for them straight text happens to be in Chinese or Arabic. Email
messages from good programmers to lists like this rarely contain those
funny looking @?isms that basically signal to the world "Hey, I wrote
this in a Microsoft mail program that uses an extended character set
even for everyday English text characters that (of course) any Microsoft
mail program will correctly render but that will (of course) look funny
on your non-Microsoft screen, which proves that you should use Microsoft
mail programs for all purposes so you can read my mail".
I teach them that Unix tools can be used to work dark magic, but that
there is a price to be paid in sweat and chickens slaughtered with
black-handled knives before you can decide that your code needs to be
completely reorganized into subdirectories and that as part of this
reorganization every program module needs a certain code fragment to be
changed to a different code fragment and you accomplish this with three
lines from the shell in around a minute or two of actual effort. They
don't call master programmers "wizards" for nothing, but knowledge is
power, and power is never cheap.
These students are obviously in the self-sorted "I can do it myself for
any value of it" category, but the point is that all programmers ARE in
this category unless they chose their profession to make money, as the
value of "it" that they are reluctantly forced to choose in order to
stay fed and clothed. Such "programmers" as often as not become
marketing droids or pointy-haired bosses because they do not know the
Tao of Programming and rarely become more than half-competent hacks.
Only programmers who have learned to follow the Tao and slaughtered many
chickens effortlessly write code that compiles flawlessly and runs like
a category five hurricane over an ocean of processors. And they don't
do it on Windows.
> *** Number 3) "Finally, the CX1 starts at just $25,000, with fully configured
> systems reaching the $80,000 range. The system is very affordable in terms of
> both the initial capital investment and the lowered total cost of ownership
> customers will see from ease of management and standard office power and
> cooling requirements."
>
> Questions:
> How many blades and cores come in the $25k configuration? (I will answer:
> One blade, just like a
> dual-socked quad-core workstation that you can buy for $5k or less.)
> How "affordable" an investment you need to make install it in your office?
> (See RGB's posting for the answers.)
Just to end the speculation (however fun:-), here are a few facts from
its spec sheet. The box is 7U in height. It takes >>4<< 1600 watt
power supplies (2+2) to run the entire system in "fully populated, fully
redundant" mode. Two power supplies only let you run "half populated,
fully redundant mode" (1+1) or of course fully populated, not redundant
(2+0). One power supply can only support four blades, obviouly not
redundant, where the blades draw (as I learned from a different source)
roughly 375W each, presumably peak/loaded but the only object I'd really
trust here is a Kill-a-Watt -- well, 2 Kill-a-Watts -- plugged in
between the fully populated CX1 and the wall and running at full load
(maybe doing a good benchmark suite in parallel, so one can drive all
the CPUs to draw peak power synchronously).
So technically, the CX1 "can" run on an office power supply deskside, as
long as you only populate it with <=3 blades (15 A) or <=4 blades (20A)
and nothing else to speak of is on the circuit.
The box is a 7U pedestal: 13" by 18" by 36" (!). In other words, this
sucker is a yard long! Note that they only present you with tasteful
photos of the FRONT of the enclosure, where perspective hides this last
distasteful fact. They ONLY present this view -- check out their
website or brochure: you only see either the front panel or a picture
that at first makes it look like the entire box is the size of a
shoebox, would fit nicely on the corner of your desk.
Not.
It weighs 140 pounds fully populated, 63 pounds empty. This makes it a
bit more than 1/2 the width of a standard rack, and about as deep.
Two observations: If you have the room for this puppy "under your desk"
you have a bigger desk than I've got, and like to have very, very warm
feet. Just perfect for those supercomputer users in Iceland and Alaska,
and I'll bet Antarctic research stations are clamoring to get them and
plan to run heavy applications all the time with one on each side of
their desks.
Also, one can easily find and order portable/deskside racks in
sizes ranging from 8U to 24U, with wheels and/or shockmounts. A
top-dollar, shipping grade shockmounted 12U travel rack looks like it
costs around $1400 delivered. Into this one can put as many e.g.
Penguin dual-quads in standard 1U ff as you wish, at ~$5000 each in the
comparison configuration (3 GHz cores, 2 GB RAM/CPU). To these you can
add the network of your choice -- gigabit ethernet is "free", of course
-- inside an expected $2K budget for rack, cabling, switch, KVM, and
for a bit more UPS or other gingerbread. Faster networks obviously add
cost at market price. Eight additional inches of width and six+ in
height let you install 88 cores instead of 64 and use true commodity
parts instead of being locked forever to Cray if you want to upgrade or
change vendors.
Or, the volume occupied is very, very close to the space that four
minitowers would occupy, which is all one can actually plug into even a
lavishly wired office without renovation anyway. In this case one
spends a straight up ~$4-4.5K per box and you're done (plus network) and
once you go down this road, you can start thinking about a pile of
minitowers with just one quad each etc. I admit you don't get that
active noise cancellation, but noise cancellation headphones are $100
and play music too.
> **** Number 4) "We believe that there are many workstation users today who
> are used to working in a Windows environment and find the thought of moving
> to a more powerful platform like an HPC cluster and the challenge of learning
> a new operating system daunting. By offering an operating system that they
> are familiar with, we believe the barriers to adoption are significantly
> lowered. "
And I believe that they are possibly right for a fairly small value of
"many", especially if one emphasizes the term "user". This is not a
coding or development platform -- it is a platform they want to sell to
people to run canned, commercial code (and that is a sub-market right
there -- several of the blades they sell are obviously HA server blades,
not HPC). The only people likely to buy it for HPC development are
people seeking to develop canned, commercial HPC code to sell those
users, who will then use the programs to do what they are told to do
with them by bosses who themselves don't understand what the systems are
and how they work, only that they are told -- by a marketing droid --
that using the system will permit them to get their work done faster
because, well, look at all the cores! And Cray MEANS supercomputer,
doesn't it?
A true rendering of the release headline might just as well be "Cray
reinvents the SP2" -- five or ten years after the SP2 was rendered
obsolete. A commodity SP2, to be sure. We've come at least that far.
Beyond that, one can do a head to head comparison of this bladed system
and other folks' bladed or other solutions, and I think that most of
what you are getting is that ever so impressive CRAY on the front --
which has a certain cachet, does it not?;-) -- plus something that NO
DOUBT is well engineered, solid hardware, sold for a premium price, that
takes up less room than a rack of the same capacity (but well within a
factor of two in VOLUME and with at least one troublesome yard-long
dimension).
> Comments
> The system seems to be a regular cluster, nicely packed, and perhaps with an
> easy setup procedure.
> The main attraction is that it can run Windows, for those who are afraid of
> Linux.
> Other attractions are the nice looking enclosure and the brand name, that
> make it an object of desire.
I desire it, I desire it. Hell, I'd make room for it in my life and
sweat for it just to run it. I just can't possibly PAY for it, and if I
could I still wouldn't because I could get more for my money elsewise.
Now if Cray is listening in and wants to just GIVE me one (fully
populated, please -- I did install two 20A circuits just for my personal
supercomputer setup in my attic and while I'm a BIT light on AC up there
for 3.2 KW, I can probably scare up enough OPM to do an upgrade) hey,
I'll do random number generator testing and other Good Works on it and
tell everybody on this list what a great system it is! I admit it -- at
heart I'm just a whore, easily bought:-).
Oh, but not even for free would I accept one preinstalled with Windows
unless I were permitted to just pop Fedora onto it first thing. I may
be cheaply bought but I'm not stupid, and Windows would cost me way more
than the value of the hardware in my time, on this just as it has every
other system I've ever installed or managed with Windows on it. Windows
management is expensive as all hell, whether you do it yourself or pay
others to do it for you. One of the all-time great success stories of
marketing is that Microsoft has managed to hide or obscure this fact
from a truly astounding number of people for well over a decade.
> The problem with unsubstantiated statements like these on the HPC Wire
> interview
> is that they catch the attention of decision makers (Deans, department heads,
> as someone mentioned here), and you may have a hard time to distill and
> deconstruct them.
> Unless the decision maker has a background or a very good guts felling for
> computers and
> the underlying physics/engineering, the shiny brochures, the movies,
> and the big brand names can really make a dent.
>
> I had to write a long explanation, going down to details very similar to
> those RGB raised here
> (but not with the same sharp humor), to justify buying a cluster as opposed
> to a
> "turnkey" solution akin to this "deskside supercomputer" just two weeks ago.
> I went through the same arguments that RGB used here: environmental issues
> (power, A/C, floor space), TCO,
> sys admin and maintenance, pros and cons of COTS vs. proprietary HW and SW,
> etc.
> I won, but now I'll probably have to "dejavu it all over again",
> when people here learn about the CX1.
Don't forget, one can get turnkey linux clusters too. Penguin/Scyld
would love to sell you one. Joe and Scalable informatics would cheerily
custom engineer you one. Or you can EASILY get "semi-turnkey" -- a
rackful of 1U systems all wired up and ready for you to just drop on
your own favorite OS image, which for DCHP/PXE installs is about as
difficult as, well, configuring a server with a repo and e.g. kickstart
file turning the nodes on.
> If my boss was not knowledgeable in Physics and computing,
> I would have to invite RGB to come here to give my boss a briefing.
> And to recommend buying a beer keg refrigerator along with a cluster, of
> course.
And if you (or rather he) paid me vast sums of money, I'd be happy to
come give him one, keg and all. However, if I didn't charge $250+/hour
plus travel and expenses for my time, your boss wouldn't listen to me
anyway, and baby needs a new pair of shoes... because he's away in
college and shoes, they are EXPENSIVE in college.
I do think we're on the track of something important here, though. A
beer-cooled cluster. This is brilliant! Why hasn't anyone thought of
it before? They make half- and quarter-keg sized refrigerators designed
to keep a liquid at a steady 40F temperature. Beer contains alcohol,
sort of like antifreeze only "less toxic" to the environment, um, acts
like a preservative or something. The bubbles, lessee, they act as
shock absorbers against thermal expansion as the cold beer absorbs the
CPU's heat and returns to the keg for recooling.
Best of all, when a node breaks the beer is much cheaper than
antidepressants and therapy for the person who comes to replace it (and
hey, one has to drain the entire system when that occurs and refill it
with fresh beer, and one hates to waste anything, right?:-)
rgb
>
> Gus Correa
>
>
--
Robert G. Brown Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
More information about the Beowulf
mailing list