[eepro100] Transmitter Timeout -- addednum
chris
chris@soma.978.org
Sun, 30 Jul 2000 06:04:26 -0700
A quick re-cap of my hardware:
* i82557 quad 64-bit PCI (33Mhz) Ethernet card
* DEC PC164 Motherboard with 21164 EV56 processor.
I've been messing with eepro100 drivers for about 32 hours straight now
(with a few hours off for pizza), and as an addednum to my last e-mail,
this is what I have tried and found thus far:
* The TX-timeout is not dependant on what the card is connected to
afterall. Regardless of whether it is connected to a 3c905, Bay 350T,
UB 100-tx hub, or tulip card the "TX-timeout" still happens. The
timeout just happens a little quicker when connected via X-over to a
905b. . .
* All cabling is tried and true on other network cards.
* The TX-timout occurs on just about all heavy-traffic. . . the initial
(initial meaing the first timeout since boot) timeout takes a little
while to happen, but afterwards the successive time-outs come
quicker. Here is a quick table of the occurence of the timeouts in
regards to the different driver versions:
Traffic Driver Version Kernel Version Initial-Timeout(sec)
Successive Time-outs(sec) Recovery Time(sec)
heavy NFS read/writes 1.06 2.2.14 25-30 8-10 1-2
mpeg streaming vis SAMBA 1.06 2.2.14 35-40 12-15 1-2
HEAVY FTP 1.06 2.2.14 IMMEDIATE 1-2 4-5
telnet/ssh/http 1.06 2.2.14 NONE - -
heavy NFS read/writes 1.09 2.2.14 30-45 10-12 8-10
mpeg streaming vis SAMBA 1.09 2.2.14 115-140 15-20 8-10
HEAVY FTP 1.09 2.2.14 IMMEDIATE <1 1-2
telnet/ssh/http 1.09 2.2.14 NONE - -
heavy NFS read/writes 1.09 2.2.16 30-45 10-12 8-10
mpeg streaming vis SAMBA 1.09 2.2.16 115-140 15-20 8-10
HEAVY FTP 1.09 2.2.16 IMMEDIATE <1 1-2
telnet/ssh/http 1.09 2.2.16 30minutes ??? a long
time.
ALL 1.09 2.4.0-test5 N/A*
*=OS locks IMMEDIATELY after reaching the eepro100 code when compiled in
the kernel, or upon ismod when running as a module with NO ERROR
MESSAGES.
MESSAGES:
On v1.06 of the driver, this is what /var/log/messages says:
Jul 25 09:59:12 fosters kernel: eth0: Transmit timed out: status 0050
0000 at 322796/322810 command 000c0000.
Jul 25 09:59:12 fosters kernel: eth0: Trying to restart the
transmitter...
On v1.09 of the driver this is what /var/log/messages says:
Jul 30 03:25:26 fosters kernel: eth0: Transmit timed out: status 0050
0c00 at 107640/107670 command 200c0000.
BOOT MESSAGE:
Jul 29 22:39:31 fosters kernel: eth0: OEM i82557/i82558 10/100 Ethernet
at 0x9000, 00:08:C7:91:08:72, IRQ 17.
Jul 29 22:39:31 fosters kernel: Board assembly 009542-001, Physical
connectors present: RJ45
Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1.
Jul 29 22:39:31 fosters kernel: General self-test: passed.
Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed.
Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed.
Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed
(0x24c9f043).
Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated.
Jul 29 22:39:31 fosters kernel: eth1: OEM i82557/i82558 10/100 Ethernet
at 0x9800, 00:08:C7:91:08:73, IRQ 24.
Jul 29 22:39:31 fosters kernel: Board assembly 009542-001, Physical
connectors present: RJ45
Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1.
Jul 29 22:39:31 fosters kernel: General self-test: passed.
Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed.
Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed.
Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed
(0x24c9f043).
Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated.
Jul 29 22:39:31 fosters kernel: eth2: OEM i82557/i82558 10/100 Ethernet
at 0xa000, 00:08:C7:66:80:F7, IRQ 28.
Jul 29 22:39:31 fosters kernel: Board assembly 009545-001, Physical
connectors present: RJ45
Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1.
Jul 29 22:39:31 fosters kernel: General self-test: passed.
Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed.
Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed.
Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed
(0x24c9f043).
Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated.
Jul 29 22:39:31 fosters kernel: eth3: OEM i82557/i82558 10/100 Ethernet
at 0xa800, 00:08:C7:66:80:0F, IRQ 32.
Jul 29 22:39:31 fosters kernel: Board assembly 009545-001, Physical
connectors present: RJ45
Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1.
Jul 29 22:39:31 fosters kernel: General self-test: passed.
Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed.
Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed.
Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed
(0x24c9f043).
Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated.
PCI:
There doesn't seem to be any PCI conflicts and I tried both enabling and
disabling "PCI quirks" in the kernel with no avail. . .
Here is a cat of my /proc/pci:
PCI devices found:
Bus 0, device 7, function 0:
PCI bridge: DEC DC21154 (rev 2).
Medium devsel. Fast back-to-back capable. Master Capable.
Latency=32.
Min Gnt=4.
Bus 0, device 8, function 0:
Non-VGA device: Intel 82378IB (rev 67).
Medium devsel. Master Capable. No bursts.
Bus 0, device 9, function 0:
VGA compatible controller: Matrox Millennium (rev 1).
Medium devsel. Fast back-to-back capable. IRQ 19.
Non-prefetchable 32 bit memory at 0x9000000 [0x9000000].
Non-prefetchable 32 bit memory at 0x9800000 [0x9800000].
Bus 0, device 11, function 0:
IDE interface: CMD 646 (rev 1).
Medium devsel. Fast back-to-back capable. IRQ 21. Master
Capable. Late
ncy=64. Min Gnt=2.Max Lat=4.
I/O at 0x8000 [0x8001].
Bus 1, device 4, function 0:
Ethernet controller: Intel 82557 (rev 5).
Medium devsel. Fast back-to-back capable. IRQ 17. Master
Capable. Late
ncy=32. Min Gnt=8.Max Lat=56.
Non-prefetchable 32 bit memory at 0xa000000 [0xa000000].
I/O at 0x9000 [0x9001].
Non-prefetchable 32 bit memory at 0xa100000 [0xa100000].
Bus 1, device 5, function 0:
Ethernet controller: Intel 82557 (rev 5).
Medium devsel. Fast back-to-back capable. IRQ 24. Master
Capable. Late
ncy=32. Min Gnt=8.Max Lat=56.
Non-prefetchable 32 bit memory at 0xa200000 [0xa200000].
I/O at 0x9800 [0x9801].
Non-prefetchable 32 bit memory at 0xa300000 [0xa300000].
Bus 1, device 6, function 0:
Ethernet controller: Intel 82557 (rev 5).
Medium devsel. Fast back-to-back capable. IRQ 28. Master
Capable. Late
ncy=32. Min Gnt=8.Max Lat=56.
Non-prefetchable 32 bit memory at 0xa400000 [0xa400000].
I/O at 0xa000 [0xa001].
Non-prefetchable 32 bit memory at 0xa500000 [0xa500000].
Bus 1, device 7, function 0:
Ethernet controller: Intel 82557 (rev 5).
Medium devsel. Fast back-to-back capable. IRQ 32. Master
Capable. Late
ncy=32. Min Gnt=8.Max Lat=56.
Non-prefetchable 32 bit memory at 0xa600000 [0xa600000].
I/O at 0xa800 [0xa801].
Non-prefetchable 32 bit memory at 0xa700000 [0xa700000].
and there doesn't seem to be any IO issues: cat of /proc/ioports:
0060-006f : keyboard
0070-007f : timer
0170-0177 : ide1
01f0-01f7 : ide0
02f8-02ff : serial(auto)
0376-0376 : ide1
03c0-03df : vga+
03e8-03ef : serial(auto)
03f6-03f6 : ide0
03f8-03ff : serial(auto)
8000-8007 : ide0
8008-800f : ide1
a000000-a00001f : Intel Speedo3 Ethernet
a200000-a20001f : Intel Speedo3 Ethernet
a400000-a40001f : Intel Speedo3 Ethernet
a600000-a60001f : Intel Speedo3 Ethernet
TRAIL-N-ERROR:
Forcing different interface speeds via mii-diag does not fix anything:
100baseTX-FD -- timeout still occurs
100baseTX-HD -- timeout still occurs
10baseT-FD -- timeout still occurs
10baseT-HD -- timeout still occurs
eepro-diag:
eepro100-diag.c:v2.02 7/19/2000 Donald Becker (becker@scyld.com)
http://www.scyld.com/diag/index.html
Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
at 0x9000
.
A potential i82557 chip has been found, but it appears to be active.
Either shutdown the network, or use the '-f' flag.
Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
at 0x9800
.
A potential i82557 chip has been found, but it appears to be active.
Either shutdown the network, or use the '-f' flag.
Index #3: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
at 0xa000
.
A potential i82557 chip has been found, but it appears to be active.
Either shutdown the network, or use the '-f' flag.
Index #4: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
at 0xa800
.
Chainging MACROS:
v1.06:
txfifo/rxfifo: changes do nothing
TX_RING_SIZE/RX_RINGSIZE: changes do nothing
TX_TIMEOUT: Increasing this number decreases the freqency of the
timeouts until the number reaches roughly double what it was originally
set for, then the interfaces are not usable until an ifdown/ifup
v1.09:
txfifo/rxfifo: changes do nothing
TX_RING_SIZE/RX_RINGSIZE: changes do nothing
TX_TIMEOUT: Incresing this number at all makes the interfaces unusable
until an ifdown/ifup.
Also, I ported the code from v1.09 to v1.06 for the function "static
void speedo_tx_timeout(struct net_device *dev)" to see what happens --
the new "hybrid" driver exhibited the characteristics of the v1.09
timeouts.
Lastly, changing txqueuelen via ifconfig does nothing. . .
Conclusion:
v1.06 of the driver seemed to handle the TX timeouts a quicker then
v1.09, but in v1.09 they were less frequent. I tried to compile v1.10
and experimental v1.11, but I got all types of compile errors and did
not have the motivation to port them to v2.2.16 of the kernel after all
my above failures.
I have NO IDEA what is causing these TX timeouts. . . if any of the
gurus here would be as kind as to aide me in my efforts to figure this
out, I would greatly appreciate it! I will grant accounts on the
troublesome machine if that will aide in trouble-shooting, and I will
code whatever I can if anyone can give me a direction to go in. . .
Is there anything special that I have to set in the kernel for 64-bit
PCI, BTW?
Could the fact that this card is a 64-bit PCI card be the issue?
Are there any special parameters that I could try tweaking that are
alpha-specific?
Thank you for any help!!
--Chris