[Beowulf] MPI shared memory errors
Brent M. Clements
bclem at rice.edu
Thu Jul 8 12:16:12 PDT 2004
Does anyone know how to fix the problem below? We have an idea or two but
want to get other admin's opinions.
Thanks,
Brent
Brent Clements
Linux Technology Specialist
Information Technology
Rice University
Linux at Rice news and information
available only at http://linuxsupport.rice.edu
---------- Forwarded message ----------
Date: Thu, 08 Jul 2004 13:15:18 -0500
From: Randy Crawford <rand at rice.edu>
To: Brent M. Clements <bclem at rice.edu>
Subject: Re: can you send me that error again?
When running two processes over ethernet MPI, the original error was:
"
p2_15517: (38.889341) xx_shmalloc: returning NULL; requested 65584
p2_15517: (38.889341) p4_shmalloc returning NULL; request = 65584 bytes
You can increase the amount of memory by setting the environment variable
P4_GLOBMEMSIZE (in bytes); the current size is 4194304
p2_15517: p4_error: alloc_p4_msg failed: 0
CHARMDEBUG> Processor 3 has PID 15518
CHARMDEBUG> Processor 1 has PID 13334
bm_list_13335: (39.139197) net_send: could not write to fd=5, errno =32
"
I then reset shmmax on all the nodes to be much higher, and I think
the failure then occurred at 128 KB.
Then I set P4_GLOBMEMSIZE to something like 2 GB (instead of 4 MB), and I got a
different error:
p0_6444: p4_error: exceeding max num of P4_MAX_SYSV_SHMIDS: 256
More information about the Beowulf
mailing list