[Beowulf] HPCG benchmark, again
Jörg Saßmannshausen
sassy-work at sassy.formativ.net
Mon Mar 21 17:25:21 UTC 2022
Dear all,
first: many thanks for all the replies and suggestions, which is very much
appreciated!
I should have explained better: we have already some 'real' programs we are
actually using in the mix, which are testing not only the CPU and GPU but also
the interconnect between nodes. Thus, the benchmarks will not only run on a
single node but also between them. The 'real' programs are ones which rely on
memory bandwidth as well and we know how they should perform and when there is
a memory bottleneck. Apologies for my lack of communication here.
Else: from what I read given that the HPCG benchmark is only part of the mix I
think the takeaway message is it is a good benchmark to use.
What I am after is: I want to have a set of numbers which were produced by
means of some kind of standard and reproducible testing. Standard means here
not ISO or suchlike, but the same installation procedure has been used, with
all its pros and cons, for different machines. Reproducible means: more than
one test run please so we get a reliable number, and not just the best one.
That is probably me wearing my silly scientist hat. :D
As we are only interested in the A100 cards for one reason or another, that
will narrow down things quite a bit as well.
The reason behind that, other than just making a decision which one to buy,
will also be we want to use that as an acceptance test. Thus, if a vendor
claims: I get 100X out of the system! and said vendor after installation can
only get 1X out, there is clearly a problem the vendor then has. Hence my
stance on standard and reproducibility.
I will speak to the rest of the team if they are intersted in the other tests
which were suggested, or not. So many thanks for pointing them out to me! Most
of them I did not know.
In any case: thanks again and please stay safe, all of you!
Jörg
Am Samstag, 19. März 2022, 12:40:12 GMT schrieb Richard Walsh:
> All,
>
> We are now perhaps unknowingly and implicitly discussing the definitions and
> differences between synthetic and applications benchmarks, and what
> emphasis each should be given in evaluating competing candidate system for
> purchase.
>
> It is of course best to use both and to recognize the particular narrowness
> of stand-alone synthetics.
>
> For instance, as suggested, adding communication synthetics to the mix is
> good, while recognizing their limitations … they by themselves tell you
> little about how a fabric will behave under the congestive load of a job
> mix that will typically be running in a production system. If you are
> interested in that (which I think you should be), then you might like to
> run GPCnet:
>
> https://github.com/netbench/GPCNET
>
> Or, as is often done define a job mix throughout test.
>
> No complex object intended to be used for a complex purpose can be evaluated
> by a couple of numbers.
>
> Think of the problem of choosing an HPC system fit for your purposes as a
> job interview for a local open position for a system or performance
> engineer, and benchmark it accordingly.
>
> rbw
>
> Sent from my iPhone
>
> > On Mar 19, 2022, at 2:18 AM, Benson Muite <benson_muite at emailplus.org>
> > wrote:
> >
> > For memory bandwidth, single node tests such as Likwid are helpful
> > https://github.com/RRZE-HPC/likwid
> >
> > MPI communication benchmarks are a good complement to this.
> >
> > Full applications do more than the above, but these are easier starting
> > points that require less domain specific application knowledge for
> > general performance measurement.>
> >> On 3/19/22 3:58 AM, Richard Walsh wrote:
> >> J,
> >> Trying to add a bit to the preceding useful answers …
> >> In my experience running these codes on very large systems for
> >> acceptances, to get optimal (HPCG or HPL) performance on GPUs (MI200 or
> >> A100) you need to obtain the optimized versions from the vendors which
> >> include scripts with ENV variable tunings specific the their versions
> >> and optimal affinity settings to manage the non-simple relationship
> >> between the NICs, the GPUs, and CPUs … you have iterate through the
> >> settings to find optimal settings for you system. If you set out to do
> >> this on your own, the chances of getting values similar to those posted
> >> on the TOP500 website are vanishingly small … As already noted, buyers
> >> of large HPC systems almost always require large scale runs of both HPCG
> >> (to demonstrate peak bandwidth) and HPL (to demonstrated peak processor)
> >> performance. Cheers!
> >> rbw
> >> Sent from my iPhone
> >>
> >>>> On Mar 18, 2022, at 7:35 PM, Massimiliano Fatica <mfatica at gmail.com>
wrote:
> >>>
> >>> HPCG measures memory bandwidth, the FLOPS capability of the chip is
> >>> completely irrelevant. Pretty much all the vendor implementations reach
> >>> very similar efficiency if you compare them to the available memory
> >>> bandwidth. There is some effect of the network at scale, but you need
> >>> to have a really large system to see it in play.
> >>>
> >>> M
> >>>
> >>>> On Fri, Mar 18, 2022 at 5:20 PM Brian Dobbins <bdobbins at gmail.com
<mailto:bdobbins at gmail.com>> wrote:
> >>> Hi Jorg,
> >>>
> >>> We (NCAR - weather/climate applications) tend to find that HPCG
> >>>
> >>> more closely tracks the performance we see from hardware than
> >>> Linpack, so it definitely is of interest and watched, but our
> >>> procurements tend to use actual code that vendors run as part of
> >>> the process, so we don't 'just' use published HPCG numbers.
> >>> Still, I'd say it's still very much a useful number, though.>>>
> >>> As one example, while I haven't seen HPCG numbers for the MI250x
> >>>
> >>> accelerators, Prof. Matuoka of RIKEN tweeted back in November that
> >>> he anticipated that to score around 0.4% of peak on HPCG, vs 2% on
> >>> the NVIDIA A100 (while the A64FX they use hits an impressive 3%):
> >>> https://twitter.com/ProfMatsuoka/status/1458159517590384640
> >>> <https://twitter.com/ProfMatsuoka/status/1458159517590384640>
> >>>
> >>> Why is that relevant? Well, /on paper/, the MI250X has ~96 TF
> >>>
> >>> FP64 w/ Matrix operations, vs 19.5 TF on the A100. So, 5x in
> >>> theory, but Prof Matsuoka anticipated a ~5x differential in HPCG,
> >>> /erasing/ that differential. Now, surely /someone/ has HPCG
> >>> numbers on the MI250X, but I've not yet seen any. Would love to
> >>> know what they are. But absent that information I tend to bet
> >>> Matsuoka isn't far off the mark.
> >>>
> >>> Ultimately, it may help knowing more about what kind of
> >>>
> >>> applications you run - for memory bound CFD-like codes, HPCG tends
> >>> to be pretty representative.
> >>>
> >>> Maybe it's time to update the saying that 'numbers never lie' to
> >>>
> >>> something more accurate - 'numbers never lie, but they also rarely
> >>> tell the whole story'.
> >>>
> >>> Cheers,
> >>> - Brian
> >>>
> >>> On Fri, Mar 18, 2022 at 5:08 PM Jörg Saßmannshausen
> >>> <sassy-work at sassy.formativ.net
> >>>
> >>> <mailto:sassy-work at sassy.formativ.net>> wrote:
> >>> Dear all,
> >>>
> >>> further the emails back in 2020 around the HPCG benchmark
> >>> test, as we are in
> >>> the process of getting a new cluster I was wondering if
> >>> somebody else in the
> >>> meantime has used that test to benchmark the particular
> >>> performance of the
> >>> cluster.
> >>> From what I can see, the latest HPCG version is 3.1 from
> >>> August 2019. I also
> >>> have noticed that their website has a link to download a
> >>> version which
> >>> includes the latest A100 GPUs from nVidia.
> >>> https://www.hpcg-benchmark.org/software/view.html?id=280
> >>> <https://www.hpcg-benchmark.org/software/view.html?id=280>
> >>>
> >>> What I was wondering is: has anybody else apart from Prentice
> >>> tried that test
> >>> and is it somehow useful, or does it just give you another set
> >>> of numbers?
> >>>
> >>> Our new cluster will not be at the same league as the
> >>> supercomputers, but we
> >>> would like to have at least some kind of handle so we can
> >>> compare the various
> >>> offers from vendors. My hunch is the benchmark will somehow
> >>> (strongly?) depend
> >>> on how it is tuned. As my former colleague used to say: I am
> >>> looking for some
> >>> war stories (not very apt to say these days!).
> >>>
> >>> Either way, I hope you are all well given the strange new
> >>> world we are living
> >>> in right now.
> >>>
> >>> All the best from a spring like dark London
> >>>
> >>> Jörg
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Beowulf mailing list, Beowulf at beowulf.org
> >>> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> >>> To change your subscription (digest mode or unsubscribe) visit
> >>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >>> <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
> >>>
> >>> _______________________________________________
> >>> Beowulf mailing list, Beowulf at beowulf.org
> >>> <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
> >>> To change your subscription (digest mode or unsubscribe) visit
> >>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >>> <https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
> >>>
> >>> _______________________________________________
> >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> >>> To change your subscription (digest mode or unsubscribe) visit
> >>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf>
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
More information about the Beowulf
mailing list