[Beowulf] Woodcrest Memory bandwidth
Richard Walsh
rbw at ahpcrc.org
Mon Aug 14 13:36:29 PDT 2006
Joe Landman wrote:
>4-threads
>
>Copy: 6645.4125 0.0965 0.0963 0.0976
>Scale: 6994.6233 0.0916 0.0915 0.0917
>Add: 6373.0207 0.1508 0.1506 0.1509
>Triad: 6710.7522 0.1432 0.1431 0.1433
>
>I may have been Bill's 10 GB/s source, and that may have been a mixup
on my part.
10 GB/sec of course comes from the advertised bandwidth off a single socket.
Yes, this is quite disappointing because the "on-paper" numbers from each
socket to the Northbridge are nicely balanced with the 4-channel FB-DIMM
numbers. Then there is all the discussion of the advantages of the
shared L2 cache
and the shared-cache-intelligent pre-fetch engines and cool memory
dis-ambiguation.
Seemingly irrelevant I guess, if the Northbridge is still under designed.
Is it possible that the compilers are just not ready to effectively use
some of these
features ... ?? ... on the other hand stream is sufficiently simple that
these features
probably do not come into play anyway. The real application benchmarks
with
some quantity of locality look better.
Any one working on compilers care to comment what's the bottleneck
really is?
rbw
--
Richard B. Walsh
Project Manager
Network Computing Services, Inc.
Army High Performance Computing Research Center (AHPCRC)
rbw at ahpcrc.org | 612.337.3467
-----------------------------------------------------------------------
This message (including any attachments) may contain proprietary or
privileged information, the use and disclosure of which is legally
restricted. If you have received this message in error please notify
the sender by reply message, do not otherwise distribute it, and delete
this message, with all of its contents, from your files.
-----------------------------------------------------------------------
More information about the Beowulf
mailing list