[realtek] rtl8139_tx_interrupt [8139too] problem in Linux cluster
Narisara Thongboonchoo
nthongbo@cgrer.uiowa.edu
Wed Sep 18 11:10:01 2002
Dear Donald and Realtek user,
All of my 4 nodes went down randomly one by one. I've tried to add more memory
from 512 MB to 1 GB but it's still down w/ the same error message. I also
changed the network driver to Netgear but it didn't solve the problem. Could
you provide me any suggestion?
Regards,
Narisara
On Sat, 14 Sep 2002, Donald Becker wrote:
> On Thu, 5 Sep 2002, Narisara Thongboonchoo wrote:
>
> > I had troubles w/ 4 nodes Linux cluster system when run program w/ MPI and
> > ssh command. However, I couldn't finnish my job since one of 4 nodes
> > keep random died.
>
> The same node, or different nodes?
> If it's the same node every time, you shouldn't be looking for a
> software fix.
>
> > The job was killed since there's no route to that machine. I'm not
> > sure why it happended but found error messages about
> > rtl8139_tx_interrupt & rtl8139_interrupt. Is it possible that network
> > communication cause this problem? If so, could you give me any
> > suggestion?
>
> If this isn't a memory problem, then it's a device driver problem. No
> user-level software should be able to cause this type of kernel error.
>
> > Call Trace: [<e098e308>] rtl8139_tx_interrupt [8139too] 0x128
> > [<e098e91a>] rtl8139_interrupt [8139too] 0xba
> > [<c0109c7a>] handle_IRQ_event [kernel] 0x3a
> > [<c0109df8>] do_IRQ [kernel] 0x68
> >
> > Code: ff 50 14 8b 00 29 32 c0 83 e0 d7 83 c8 04 5a a9 03 00 00
> > <0> kernel panic: Aiee, killing interrupt handler!
>
>
--
^---^
********************************* >( . . )< Meaw..Meaw
Narisara Thongboonchoo ..x..
326 Hawkeye Drive . @ .
Iowa City, IA 52246, USA . . . .
Tel&Fax : (319) 353-4797 (home) . . x . .
: (319) 335-2063 (Office) .m. | .m.
(319) 335-3335 (Lab) . | .
******************************** .* | *.
*....|....*