[Beowulf] Error while tstmachines still not solved
akhtar Rasool
akhtar_samo at yahoo.com
Mon Dec 27 00:55:08 PST 2004
Hi,
Actually the MPICH is installed on the root (server) node, how would other nodes be able to see the path of mpi binaries
. As u have written, let me know how nodes would be able to see executable program & mpi libraries
Whatever MPI program I m executing it is giving the output but wall clock time is increasing as the np argument value increase, because the tasks arent running on other nodes only on the server
.
I m using a 2 node LINUX 9 cluster & MPICH 1.2.5.2 as an MPI
I have to present my project on 30th December, kindly solve the problem
.
Akhtar
Glen Gardner <Glen.Gardner at verizon.net> wrote:The error in the 5th step is caused by a chatty login message. This makes mpi complain but it ought to work anyway.
You want to turn off motd, and if using freebsd create a file called ".huslogin" and put it in the users home directory.
The next error is to do with paths to mpich and to the program being launched.
All the nodes need to be able to "see" the mpi binaries and need to be able to see the executable program.
The paths to mpi and the program being launched need to be the same for all nodes and for the root node.
Make sure the path is seutup properly in the environment. You may need to chek your mount points and setup NFS properly.
The last one probably has to do with name resolution.
The root node usually won't need to be in the machines.linux file, but all other nodes need to be.
I believe you need to list machines by hostname, not ip addresses so be sure that both machines have the same hostfile, same .rhosts, etc.
Glen
The next message indicates that the path to the executable "mpichfoo" was not found.
akhtar Rasool wrote:
After the extraction of MPICH in /usr/local
1- tcsh
2- ./configure with-comm=shared --prefix=/usr/local
3- make
4- make install
5- util/tstmachines
in the 5th step error was
Errors while trying to run rsh 192.168.0.25 n /bin/ls /usr/local/mpich/mpich-1.2.5.2/mpichfoo unexpected response from 192.168.0.25
n > /bin/ls: /usr/local/mpich/mpich-1.2.5.2/mpichfoo:
n no such file or directory
The ls test failed on some machines.
This usually means that u donot have a common filesystem on all of the machines in your machines list; MPICH requires this for mpirun (it is possible to handle this in a procgroup file; see the
)
Other possible problems include:-
The remote shell command rsh doesnot allow you to run ls.
See the doc abt remote shell & rhosts
You have common filesystem, but with inconsistent names
See the doc on the automounter fix
1 error were encountered while testing the machines list for LINUX
only these machines seem to be available
host1
now since this is only a two node cluster host1 is the server on to which MPICH is being installed. & 192.168.0.25 is the client
..
rsh on both nodes is logging freely
.
On the server side the file machines.LINUX contains
-192.168.0.25
-host1
Kindly help
Akhtar
---------------------------------
Do you Yahoo!?
The all-new My Yahoo! What will yours do?
---------------------------------
_______________________________________________Beowulf mailing list, Beowulf at beowulf.orgTo change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
-- Glen E. Gardner, Jr.AA8CAMSAT MEMBER 10593Glen.Gardner at verizon.nethttp://members.bellatlantic.net/~vze24qhw/index.html
---------------------------------
Do you Yahoo!?
Dress up your holiday email, Hollywood style. Learn more.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20041227/0a9102c3/attachment.html>
More information about the Beowulf
mailing list