[eepro100] wait_for_cmd_done timeout
Donald Becker
becker@scyld.com
Tue Mar 5 14:58:01 2002
On Tue, 5 Mar 2002, Wilson, John wrote:
> I am seeing a problem with wait_for_cmd_done that is very similar to timeout
> issue that I found on GeoCrawler.
...
> In summary: It appears to me that the network is being flooded with ICMP
> traffic (and possibly other traffic) and that the eepro100 may not be
> handling the errors/traffic. (I'm new to Linux device drivers, so please
> bear with me here).
There are a bunch of errors reported here. The device driver does not
cause the errors -- it only reports them.
> I'm running:
> RH 7.2
> Kernel 2.4.9-13 modified to support the ATM device drivers (eni and
> FORE (Marconi))
> ATM on Linux support software: linux-atm-2.4.0
> Samba
...
> Mar 5 09:05:14 sla2 kernel: eepro100: wait_for_cmd_done timeout!
> Mar 5 09:05:46 sla2 last message repeated 24 times
> Mar 5 09:05:48 sla2 last message repeated 3 times
> Mar 5 09:05:49 sla2 kernel: NETDEV WATCHDOG: eth0: transmit timed out
> Mar 5 09:05:49 sla2 kernel: eth0: Transmit timed out: status 0050 0c80 at
> 48699/48728 command 00030000.
You should run eepro100-diag to see more chip status information.
Nothing is obviously wrong from this report.
> Mar 5 09:06:22 sla2 kernel: eni(itf 0): TX DMA full
> Mar 5 09:06:23 sla2 last message repeated 7 times
> Mar 5 09:06:24 sla2 kernel: eni(itf 0): TX DMA full
>
> At this point both the eth0 interface and atm0 interface stop working. Note
> that the eepro100 times out first and then the eni driver also dies with TX
> DMA full error.
Yup. That indicates that there is a system problem that affects both
devices.
> ifconfig shows:
> eth0 Link encap:Ethernet HWaddr 00:50:8B:D3:92:7C
> RX packets:504622 errors:0 dropped:0 overruns:0 frame:0
> TX packets:47444 errors:289 dropped:0 overruns:0 carrier:0
> collisions:1416 txqueuelen:100
> RX bytes:50644863 (48.2 Mb) TX bytes:10479503 (9.9 Mb)
> Note the collisions are on eth0.
What type of link partner? What does 'mii-diag' or 'eepro100-diag -m'
report?
> I wanted to point out that the eepro100 is timing out and is effecting the
> ATM device driver too.
That's not likely what is happening. While the eepro100 driver is
encountering a problem that causes a timeout, the system workload is
reduced. Even so, the ATM device driver is reporting a problem. It
seems more likely that both problems are caused by a third source.
> The eepro100 version is:
> "eepro100.c:v1.09j-t 9/29/99 Donald Becker
> http://cesdis.gsfc.nasa.gov/linux/drivers/eepro100.html\n"
Grrr, they still refuse to update the URL.
> "eepro100.c: $Revision: 1.36 $ 2000/11/17 Modified by Andrey V. Savochkin
> <saw@saw.sw.com.sg> and others\n";
>
> I know there is a lot of info here, but after reading the thread on the
> wait_for_cmd_done, I thought this might shed some light on the problem and
> that it may not be confined to the newer/experimental kernels.
>
> Any help would be much appreciated.
Have you tried the driver from
http://www.scyld.com/network/eepro100.html
ftp://www.scyld.com/pub/network/eepro100.c
It might not solve the system problem, but it is more likely to report
useful diagnostic information.
Donald Becker becker@scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993