[realtek] rtl8139_tx_interrupt [8139too] problem in Linux cluster
Donald Becker
becker@scyld.com
Sat Sep 14 10:37:01 2002
On Thu, 5 Sep 2002, Narisara Thongboonchoo wrote:
> I had troubles w/ 4 nodes Linux cluster system when run program w/ MPI and
> ssh command. However, I couldn't finnish my job since one of 4 nodes
> keep random died.
The same node, or different nodes?
If it's the same node every time, you shouldn't be looking for a
software fix.
> The job was killed since there's no route to that machine. I'm not
> sure why it happended but found error messages about
> rtl8139_tx_interrupt & rtl8139_interrupt. Is it possible that network
> communication cause this problem? If so, could you give me any
> suggestion?
If this isn't a memory problem, then it's a device driver problem. No
user-level software should be able to cause this type of kernel error.
> Call Trace: [<e098e308>] rtl8139_tx_interrupt [8139too] 0x128
> [<e098e91a>] rtl8139_interrupt [8139too] 0xba
> [<c0109c7a>] handle_IRQ_event [kernel] 0x3a
> [<c0109df8>] do_IRQ [kernel] 0x68
>
> Code: ff 50 14 8b 00 29 32 c0 83 e0 d7 83 c8 04 5a a9 03 00 00
> <0> kernel panic: Aiee, killing interrupt handler!
--
Donald Becker becker@scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993