[Beowulf] bizarre scaling behavior on a Nehalem

Fri Aug 14 08:08:32 PDT 2009

In message from Bill Broadley <bill at cse.ucdavis.edu> (Thu, 13 Aug 2009 
17:09:24 -0700):
>Tom Elken wrote:
>> To add some details to what Christian says, the HPC Challenge 
>>version of
>> STREAM uses dynamic arrays and is hard to optimize.  I don't know 
>>what's
>> best with current compiler versions, but you could try some of these 
>>that
>> were used in past HPCC submissions with your program, Bill:
>
>Thanks for the heads up, I've checked the specbench.org compiler 
>options for
>hints on where to start with optimization flags, but I didn't know 
>about the
>dynamic stream.
>
>Is the HPC challenge code open source?

Yes, they are open.

>
>> PathScale 2.2.1 on Opteron:
>> Base OPT flags: -O3 -OPT:Ofast:fold_reassociate=0 
>> STREAMFLAGS=-O3 -OPT:Ofast:fold_reassociate=0 
>>-OPT:alias=restrict:align_unsafe=on -CG:movnti=1
>
>Alas my pathscale license expired and I believe with sci-cortex's 
>death (RIP)
>I can't renew it.

Now I understand that I was sage :-)
(we purchased perpetual acafemic license). ВТW, do 
somebody know about Pathscale compilers future (if it will be) ?

Mikhail

>
>I tried open64-4.2.2 with those flags and on a nehalem single socket:
>
>$ opencc -O4 -fopenmp stream.c -o stream-open64 -static
>$ opencc -O4 -fopenmp stream-malloc.c -o stream-open64-malloc -static
>
>$ ./stream-open64
>Total memory required = 457.8 MB.
>Function      Rate (MB/s)   Avg time     Min time     Max time
>Copy:       22061.4958       0.0145       0.0145       0.0146
>Scale:      22228.4705       0.0144       0.0144       0.0145
>Add:        20659.2638       0.0233       0.0232       0.0233
>Triad:      20511.0888       0.0235       0.0234       0.0235
>
>Dynamic:
>$ ./stream-open64-malloc
>
>Function      Rate (MB/s)   Avg time     Min time     Max time
>Copy:       14436.5155       0.0222       0.0222       0.0222
>Scale:      14667.4821       0.0218       0.0218       0.0219
>Add:        15739.7070       0.0305       0.0305       0.0305
>Triad:      15770.7775       0.0305       0.0304       0.0305
>
>> Intel C/C++ Compiler 10.1 on Harpertown CPUs:
>> Base OPT flags:	 -O2 -xT -ansi-alias -ip -i-static
>> Intel recently used
>> Intel C/C++ Compiler 11.0.081 on Nehalem CPUs:
>> 	 -O2 -xSSE4.2 -ansi-alias -ip
>> and got good STREAM results in their HPCC submission on their 
>>ENdeavor cluster.
>
>$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream.c -o stream-icc
>$ icc -O2 -xSSE4.2 -ansi-alias -ip -openmp stream-malloc.c -o
>stream-icc-malloc
>
>$ ./stream-icc | grep ":"
>STREAM version $Revision: 5.9 $
>Copy:       14767.0512       0.0022       0.0022       0.0022
>Scale:      14304.3513       0.0022       0.0022       0.0023
>Add:        15503.3568       0.0031       0.0031       0.0031
>Triad:      15613.9749       0.0031       0.0031       0.0031
>$ ./stream-icc-malloc | grep ":"
>STREAM version $Revision: 5.9 $
>Copy:       14604.7582       0.0022       0.0022       0.0022
>Scale:      14480.2814       0.0022       0.0022       0.0022
>Add:        15414.3321       0.0031       0.0031       0.0031
>Triad:      15738.4765       0.0031       0.0030       0.0031
>
>So ICC does manage zero penalty, alas no faster than open64 with the 
>penalty.
>
>I'll attempt to track down the HPCC stream source code to see if 
>their dynamic
>arrays are any friendlier than mine (I just use malloc).
>
>In any case many thanks for the pointer.
>
>Oh, my dynamic tweak:
>$ diff stream.c stream-malloc.c
>43a44
>> # include <stdlib.h>
>97c98
>< static double	a[N+OFFSET],
>---
>> /* static double	a[N+OFFSET],
>99c100,102
>< 		c[N+OFFSET];
>---
>> 		c[N+OFFSET]; */
>>
>> double *a, *b, *c;
>134a138,142
>>
>>     a=(double *)malloc(sizeof(double)*(N+OFFSET));
>>     b=(double *)malloc(sizeof(double)*(N+OFFSET));
>>     c=(double *)malloc(sizeof(double)*(N+OFFSET));
>>
>283c291,293
><
>---
>>     free(a);
>>     free(b);
>>     free(c);
>
>
>
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>Computing
>To change your subscription (digest mode or unsubscribe) visit 
>http://www.beowulf.org/mailman/listinfo/beowulf
>
>-- 
>üÔÏ ÓÏÏÂÝÅÎÉÅ ÂÙÌÏ ÐÒÏ×ÅÒÅÎÏ ÎÁ ÎÁÌÉÞÉÅ × ÎÅÍ ×ÉÒÕÓÏ×
>É ÉÎÏÇÏ ÏÐÁÓÎÏÇÏ ÓÏÄÅÒÖÉÍÏÇÏ ÐÏÓÒÅÄÓÔ×ÏÍ
>MailScanner, É ÍÙ ÎÁÄÅÅÍÓÑ
>ÞÔÏ ÏÎÏ ÎÅ ÓÏÄÅÒÖÉÔ ×ÒÅÄÏÎÏÓÎÏÇÏ ËÏÄÁ.
>