[Beowulf] bizarre scaling behavior on a Nehalem
Mikhail Kuzminsky
kus at free.net
Fri Aug 14 08:08:32 PDT 2009
In message from Bill Broadley <bill at cse.ucdavis.edu> (Thu, 13 Aug 2009
17:09:24 -0700):
>Tom Elken wrote:
>> To add some details to what Christian says, the HPC Challenge
>>version of
>> STREAM uses dynamic arrays and is hard to optimize. I don't know
>>what's
>> best with current compiler versions, but you could try some of these
>>that
>> were used in past HPCC submissions with your program, Bill:
>
>Thanks for the heads up, I've checked the specbench.org compiler
>options for
>hints on where to start with optimization flags, but I didn't know
>about the
>dynamic stream.
>
>Is the HPC challenge code open source?
Yes, they are open.
>
>> PathScale 2.2.1 on Opteron:
>> Base OPT flags: -O3 -OPT:Ofast:fold_reassociate=0
>> STREAMFLAGS=-O3 -OPT:Ofast:fold_reassociate=0
>>-OPT:alias=restrict:align_unsafe=on -CG:movnti=1
>
>Alas my pathscale license expired and I believe with sci-cortex's
>death (RIP)
>I can't renew it.
Now I understand that I was sage :-)
(we purchased perpetual acafemic license). ВТW, do
somebody know about Pathscale compilers future (if it will be) ?
Mikhail
>
>I tried open64-4.2.2 with those flags and on a nehalem single socket:
>
>$ opencc -O4 -fopenmp stream.c -o stream-open64 -static
>$ opencc -O4 -fopenmp stream-malloc.c -o stream-open64-malloc -static
>
>$ ./stream-open64
>Total memory required = 457.8 MB.
>Function Rate (MB/s) Avg time Min time Max time
>Copy: 22061.4958 0.0145 0.0145 0.0146
>Scale: 22228.4705 0.0144 0.0144 0.0145
>Add: 20659.2638 0.0233 0.0232 0.0233
>Triad: 20511.0888 0.0235 0.0234 0.0235
>
>Dynamic:
>$ ./stream-open64-malloc
>
>Function Rate (MB/s) Avg time Min time Max time
>Copy: 14436.5155 0.0222 0.0222 0.0222
>Scale: 14667.4821 0.0218 0.0218 0.0219
>Add: 15739.7070 0.0305 0.0305 0.0305
>Triad: 15770.7775 0.0305 0.0304 0.0305
>
>> Intel C/C++ Compiler 10.1 on Harpertown CPUs:
>> Base OPT flags: -O2 -xT -ansi-alias -ip -i-static
>> Intel recently used
>> Intel C/C++ Compiler 11.0.081 on Nehalem CPUs:
>> -O2 -xSSE4.2 -ansi-alias -ip
>> and got good STREAM results in their HPCC submission on their
>>ENdeavor cluster.
>
>$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream.c -o stream-icc
>$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream-malloc.c -o
>stream-icc-malloc
>
>$ ./stream-icc | grep ":"
>STREAM version $Revision: 5.9 $
>Copy: 14767.0512 0.0022 0.0022 0.0022
>Scale: 14304.3513 0.0022 0.0022 0.0023
>Add: 15503.3568 0.0031 0.0031 0.0031
>Triad: 15613.9749 0.0031 0.0031 0.0031
>$ ./stream-icc-malloc | grep ":"
>STREAM version $Revision: 5.9 $
>Copy: 14604.7582 0.0022 0.0022 0.0022
>Scale: 14480.2814 0.0022 0.0022 0.0022
>Add: 15414.3321 0.0031 0.0031 0.0031
>Triad: 15738.4765 0.0031 0.0030 0.0031
>
>So ICC does manage zero penalty, alas no faster than open64 with the
>penalty.
>
>I'll attempt to track down the HPCC stream source code to see if
>their dynamic
>arrays are any friendlier than mine (I just use malloc).
>
>In any case many thanks for the pointer.
>
>Oh, my dynamic tweak:
>$ diff stream.c stream-malloc.c
>43a44
>> # include <stdlib.h>
>97c98
>< static double a[N+OFFSET],
>---
>> /* static double a[N+OFFSET],
>99c100,102
>< c[N+OFFSET];
>---
>> c[N+OFFSET]; */
>>
>> double *a, *b, *c;
>134a138,142
>>
>> a=(double *)malloc(sizeof(double)*(N+OFFSET));
>> b=(double *)malloc(sizeof(double)*(N+OFFSET));
>> c=(double *)malloc(sizeof(double)*(N+OFFSET));
>>
>283c291,293
><
>---
>> free(a);
>> free(b);
>> free(c);
>
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
>Computing
>To change your subscription (digest mode or unsubscribe) visit
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>--
>üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
>É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
>ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
>
More information about the Beowulf
mailing list