[Beowulf] DMA Memory Mapping Question
Scott Atchley
atchley at myri.com
Wed Feb 21 18:47:45 PST 2007
On Feb 21, 2007, at 7:45 PM, Chris Samuel wrote:
> Hi folks,
>
> We've got an IBM Power5 cluster running SLES9 and using the GM
> drivers.
>
> We occasionally get users who manage to use up all the DMA memory
> that is
> addressable by the Myrinet card through the Power5 hypervisor.
>
> Through various firmware and driver tweaks (thanks to both IBM and
> Myrinet)
> we've gotten that limit up to almost 1GB and then we use an
> undocumented
> environment variable (GMPI_MAX_LOCKED_MBYTE) to say only use 248MB
> of that
> per process (as we've got 4 cores in each box), which we enforce
> through
> Torque.
>
> The problems went away. Or at least it did until just now. :-(
>
> The characterstic error we get is:
>
> [13]: alloc_failed, not enough memory (Fatal Error)
> Context: <(gmpi_init) gmpi_dma_alloc: dma_recv buffers>
>
> Now Myrinet can handle running out of DMA memory once a process is
> running,
> but when it starts it must be able to allocate a (fairly trivial)
> amount of
> DMA memory otherwise you get that fatal error.
>
> Looking at the node I can confirm that there are only 3 user processes
> running, so what I am after is a way of determining how much of
> that DMA
> memory a process has allocated.
>
> I looked at /proc/${PID}/maps and saw this:
>
> 40028000-40029000 r--s 00002000 00:0c \
> 8483 /dev/gm0
>
> which to me looks like a memory mapping, but to my eyes that looks
> like just
> 1,000 bytes..
>
> Does anyone have any ideas at all ?
Isn't this in hex? If so, it would be 4096 bytes. I do not use GM
much and I do not know what this is. I just loaded GM on one node and
with no GM processes running except the mapper, I have a similar
entry (at a different address, but also 0x1000). I would guess this
is to allow GM and the mapper to communicate. I will check internally.
> Oh - switching to the Myrinet MX drivers (which doesn't have this
> problem) is
> not an option, we have an awful lot of users, mostly (non-computer)
> scientists, who have their own codes and trying to persuade them to
> recompile
> would be very hard - which would be necessary as we've not been
> able to
> convince MPICH-GM to build shared libraries on Linux on Power with
> the IBM
> compilers. :-(
>
> cheers,
> Chris
I am sorry you have not had success with MPICH-GM to compile dynamic
libs. Have you sent email to Myricom help?
Regards,
Scott
More information about the Beowulf
mailing list