[Beowulf] Stroustrup regarding multicore
Robert G. Brown
rgb at phy.duke.edu
Tue Aug 26 13:52:28 PDT 2008
On Tue, 26 Aug 2008, Lux, James P wrote:
> No. You are actually given guarantees about memory layout. They're not
> phrased as such, but they're quite rigid. (This is rather different
> from the situation with, for example, pointers, where you are
> explicitly not guaranteed that pointer types are interchangeable.)
>
> Perry
>
>
> Interesting...
> We have coding standards here (JPL) for flight software (derived from
> MISRA to a large part), which I will readily concede is NOT generally
> HPC computing, that assert that one cannot depend on a particular memory
> layout, unless it's explicitly defined somehow.
I think that matrix allocation per se is contiguous because of how and
where it occurs at the compiler level. Immediate matrices allocated
like a[10][10] are just translated into displacements relative to the
stack pointer, are they not? Global/external matrices at the a[10][10]
are prespecified in the data segment of the program, and are doubtless
allocated all at once as a memory block by the kernel/linker at runtime,
where the compiler simply computes relative displacements to an address
the kernel sets when addresses are dereferenced.
The only place where it might not be true is when one dynamically
allocates memory for a **matrix. There, if you loop through malloc
calls, there is no guarantee that they return contiguous blocks and your
matrix could be spread out over memory in nearly arbitrary ways. If you
malloc a single block and then pack its displacements into a vector of
pointers, you are guaranteed that the data space itself is contiguous.
For numerical code, I (nearly) always do the latter so that the
resulting matrix can't "accidentally" thrash the cache. It's also up to
the user to know which index is the internally vectorized index
(controls contiguous locations) and which one controls the longer stride
and code accordingly when possible.
One reason I like to dereference by repacking is that I think that it is
possible that a C compiler might handle index dereferencing arithmetic
more efficiently than arithmetical expressions in the code. This is
probably not true when doing simple pointer arithmetic, which the
compiler also recognizes, but when the pointer expression starts to
involve several variables multiplied and summed I start to think that a
matrix dereference might be faster.
However, in my more limited direct experience in benchmark code the
difference, if any, is miniscule. I've implemented e.g. stream both
with global vectors (statically allocated) and dynamically allocated
(with malloc, for user-variable vector sizes) and compared the two at
the static stream size, and in general the numbers come out to be the
same within noise.
rgb
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>
--
Robert G. Brown Phone(cell): 1-919-280-8443
Duke University Physics Dept, Box 90305
Durham, N.C. 27708-0305
Web: http://www.phy.duke.edu/~rgb
Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php
Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977
More information about the Beowulf
mailing list