[Beowulf] Re: HPL input file
Mark Hahn
hahn at mcmaster.ca
Mon Feb 26 07:39:34 PST 2007
> I am trying to find out the speed of my cluster using HPL but I am not able
> to understand what values to set in HPL.dat to find out the peak perfomance
> (e.g. the values of N, NB, PxQ, etc). Kindly help me in this regard.
following is the HPL.dat I'm currently using as a load-generator
for my cluster's 8GB dual-socket-single-core nodes. it's not for
generating HPL scores, but rather just to stress the system.
comments:
- you choose the problem size to match your memory - too low a value
will result in not enough work per cpu and lower efficiency. on my
system, I found no significant advantage to using more than 1GB/proc,
but that should depend on the CPU and interconnect speed. (faster
cpus will need more work to amortize communication; faster communication
will lower the amount of work to amortize.)
- I didn't find any strong dependence on NB.
- P*Q=ncpus; for a switched interconnect, conventional wisdom is that
you want PxQ to be close to square. on my machine (full-bisection
quadrics with dual-processor nodes) I think I've measured it being
slightly faster when run in a 1:2 shape (Q ~= 2P).
- I haven't found any strong performance dependency on any of the
other parameters, but other clusters may be different if they have
slower or non-flat networks, more procs/node, etc.
regards, mark hahn.
HPLinpack benchmark input file
Innovative Computing Laboratory, University of Tennessee
HPL.out output file name (if any)
6 device out (6=stdout,7=stderr,file)
5 # of problems sizes (N)
1000 31700 31700 31700 31700
1 # of NBs
200 NBs
0 PMAP process mapping (0=Row-,1=Column-major)
1 # of process grids (P x Q)
1 Ps
2 Qs
16.0 threshold
1 # of panel fact
1 PFACTs (0=left, 1=Crout, 2=Right)
1 # of recursive stopping criterium
4 NBMINs (>= 1)
1 # of panels in recursion
2 NDIVs
1 # of recursive panel fact.
1 RFACTs (0=left, 1=Crout, 2=Right)
1 # of broadcast
1 BCASTs (0=1rg,1=1rM,2=2rg,3=2rM,4=Lng,5=LnM)
1 # of lookahead depth
1 DEPTHs (>=0)
2 SWAP (0=bin-exch,1=long,2=mix)
64 swapping threshold
0 L1 in (0=transposed,1=no-transposed) form
0 U in (0=transposed,1=no-transposed) form
1 Equilibration (0=no,1=yes)
8 memory alignment in double (> 0)
More information about the Beowulf
mailing list