[Beowulf] Woodcrest Memory bandwidth

Mon Aug 14 13:36:29 PDT 2006

Joe Landman wrote:

 >4-threads
 >
 >Copy:        6645.4125       0.0965       0.0963       0.0976
 >Scale:       6994.6233       0.0916       0.0915       0.0917
 >Add:         6373.0207       0.1508       0.1506       0.1509
 >Triad:       6710.7522       0.1432       0.1431       0.1433
 >
 >I may have been Bill's 10 GB/s source, and that may have been a mixup 
on my part.

10 GB/sec of course comes from the advertised bandwidth off a single socket.

Yes, this is quite disappointing because the "on-paper" numbers from each
socket to the Northbridge are nicely balanced with the 4-channel FB-DIMM
numbers.  Then there is all the discussion of the advantages of the 
shared L2 cache
and the shared-cache-intelligent pre-fetch engines and cool memory 
dis-ambiguation.
Seemingly irrelevant I guess, if the Northbridge is still under designed.

Is it possible that the compilers are just not ready to effectively use 
some of these
features ... ?? ... on the other hand stream is sufficiently simple that 
these features
probably do not come into play anyway.  The real application benchmarks 
with
some quantity of locality look better.

Any one working on compilers care to comment what's the bottleneck 
really is?

rbw

-- 

Richard B. Walsh

Project Manager
Network Computing Services, Inc.
Army High Performance Computing Research Center (AHPCRC)
rbw at ahpcrc.org  |  612.337.3467

-----------------------------------------------------------------------
This message (including any attachments) may contain proprietary or
privileged information, the use and disclosure of which is legally
restricted.  If you have received this message in error please notify
the sender by reply message, do not otherwise distribute it, and delete
this message, with all of its contents, from your files.
-----------------------------------------------------------------------