[Beowulf] Transient NFS Problems in New Cluster
    Jon Forrest 
    jlforrest at berkeley.edu
       
    Tue Feb  2 14:00:37 PST 2010
    
    
  
I have a new cluster running CentOS 5.3.
The cluster uses a Sun 7310 storage server
that provides NFS service over a private
1Gb/s ethernet with 9K jumbo frames to the
cluster.
We've noticed that a number of the compute
nodes sometimes generate the
automount[15023]: umount_autofs_indirect: ask umount returned busy /home
message. When this happens the program running on the
node dies. This has happened between 10 and 20 times.
We're not sure what's going on on a node when this
happens. Most of the time everything is fine and
the home directories are automounted without problem.
I've googled for this problem and I see that other people
have seen it too, but I've never seen a resolution,
especially not for RHEL5.
The auto.master line for this mount is
/home  /etc/auto.home  --timeout=1200 
noatime,nodiratime,rw,noacl,rsize=32768,wsize=32768
The network interface configuration is
eth0      Link encap:Ethernet  HWaddr 00:30:48:B9:F6:52
           inet addr:10.1.255.233  Bcast:10.1.255.255  Mask:255.255.0.0
           inet6 addr: fe80::230:48ff:feb9:f652/64 Scope:Link
           UP BROADCAST RUNNING MULTICAST  MTU:9000  Metric:1
           RX packets:32999308 errors:0 dropped:0 overruns:0 frame:0
           TX packets:27468315 errors:0 dropped:0 overruns:0 carrier:0
           collisions:0 txqueuelen:1000
           RX bytes:24225053296 (22.5 GiB)  TX bytes:73313582546 (68.2 GiB)
           Interrupt:74 Base address:0x2000
Any advice on what to do?
Cordially,
-- 
Jon Forrest
Research Computing Support
College of Chemistry
173 Tan Hall
University of California Berkeley
Berkeley, CA
94720-1460
510-643-1032
    
    
More information about the Beowulf
mailing list