[bproc]MPI chokes
Jag
agrajag@linuxpower.org
Thu, 15 Mar 2001 08:10:41 -0800
--Qf/2YuBwNTyt+peV
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable
On Thu, 15 Mar 2001, Arthur H. Edwards,1,505-853-6042,505-256-0834 wrote:
> > Based on the error messages from your previous message, it looks like it
> > is trying to rfork to a node that is down. What does the output of
> > 'bpstat' on your cluster look like?
> >=20
> >=20
> > Jag
>=20
> Here is the output from bpstat
>=20
> jarrett/home/edwardsa>bpstat
> Node Address Status
> 0 192.168.1.100 up
> 1 192.168.1.101 up
> 2 192.168.1.102 up
> 3 192.168.1.103 up
> 4 192.168.1.104 up
> 5 192.168.1.105 up
> 6 192.168.1.106 up
> 7 192.168.1.107 down
> 8 192.168.1.108 down
> 9 192.168.1.109 down
<snip>
Ok.. You seem to be running Scyld's PREVIEW release (27BZ-6). At the
end of January, Scyld had an actual release (27BZ-7). The 27BZ-7
release included updated software, including updates for the beompi,
which is Scyld's MPI package.
I never tried to run MPI programs on the preview release, but my guess
is that it is getting confused by all the "down" nodes. I've played
with MPI on the 27BZ-7 release and have had no problems when there were
down nodes. So, I would recommend to you that you upgrade to the latest
release.
Also, the reason you have so many "down" nodes is that you gave it a
large IP range to use for slave nodes. If you want there to be not as
many "down" nodes (that are really nodes that just don't exist), you
should use the beosetup program, click on preferences, and adjust the IP
range so that there are as many IPs as there are slave nodes.
Hope this helps,
Jag
--Qf/2YuBwNTyt+peV
Content-Type: application/pgp-signature
Content-Disposition: inline
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.0.4 (GNU/Linux)
Comment: For info see http://www.gnupg.org
iD8DBQE6sOmB+pq97aGGtXARAlguAJ9elryZCI/bv2nbPd31ouoVqbc5jACcDrfX
jhnIgRgppTsXMIlRIJitXoc=
=HJ4u
-----END PGP SIGNATURE-----
--Qf/2YuBwNTyt+peV--