[realtek] rtl8139_tx_interrupt [8139too] problem in Linux cluster
Narisara Thongboonchoo
nthongbo@cgrer.uiowa.edu
Thu Sep 5 16:30:00 2002
Dear realtex list user,
I had troubles w/ 4 nodes Linux cluster system when run program w/ MPI and
ssh command. However, I couldn't finnish my job since one of 4 nodes keep random
died. The job was killed since there's no route to that machine.
I'm not sure why it happended but found error messages about
rtl8139_tx_interrupt & rtl8139_interrupt. Is it possible that network communication
cause this problem? If so, could you give me any suggestion?
Regards,
Narisara
My system use Redhat 7.3 and P4 1.6 GHz. 4 Nodes are Soyo P4VDA motherboard w/
Realtek 8139 LAN onboard & VIA P4X266A chip set, and 512 MB of DDR.
A Master is Soyo P4S Dragon Ultra MB w/ sis 900/7016 LAN on Board & Sis 645 chipset
and 1.5 GB of DDR ram. Network switch is Netgear Fast Ethernet FS 105.
--------------------------------------------------------------------------------
*pde = 00000000
Oops: 0000
CPU: 0
binfmt_misc nfsd autofs nfs lockd sunrpc 8139to mii ide-scsi_mod ide-cd
EIP: 0010:[<c0109db4>] Not tainted
EFLAGS: 00010002
eax: 00000000 ebx: 00000160 ecx: 064d600b edx: 00000018
esi: 0000000b edi: c0322a60 ebp: 00000000 esp: de83defc
ds: 0018 es: 0018 ss: 0018
Process mm5.mpp (pid: 1139 stackpage=de83d000)
Stack: 0000000b dfd14560 064d6008 00000003 e0993000 c021e2a4 dfd14560 064d600b
001605ea 064d6008 00000003 e0993000 00000000 de830018 ded10018 ffffff0b
e098e308 00000010 00000246 00000004 e0993000 dfd14400 dfd14560 e098e91a
Call Trace: [<e098e308>] rtl8139_tx_interrupt [8139too] 0x128
[<e098e91a>] rtl8139_interrupt [8139too] 0xba
[<c0109c7a>] handle_IRQ_event [kernel] 0x3a
[<c0109df8>] do_IRQ [kernel] 0x68
Code: ff 50 14 8b 00 29 32 c0 83 e0 d7 83 c8 04 5a a9 03 00 00
<0> kernel panic: Aiee, killing interrupt handler!
In interrupt handler -not syncing
________________________________________________________________________________