[eepro100] Transmitter Timeout -- addednum
Kallol Biswas
kallol@bugula.fpk.hp.com
Sun, 30 Jul 2000 10:41:29 EDT
I don't know about the latest eepro100 driver, but the version
I saw had a fundamental design problem, again I will try explain:
82559 prefetches the next command from the command ring,
suppose the cmd unit is executing ith command and has
has prefetched the next one, i.e. (i+1)th already, driver
sets up the the (i+1)th cmd, sets the S bit and sends RESUME,
if the CU:
*in Suspended state it goes to active state, does not re-read next
link ponter(address for i+1th) re-reads the Sbit of of ith command.
If the Sbit of ith command is cleared then executes the i+1th otherwise
goes back to suspended state.
*If CU is active it checks the validity of S bits of next(i+1 th)
and present(ith) cmd(PCI cmd 0x6 MR is used to re-read Sbit of a TxCB, I saw
it on analyzer).
Please note that it does not say it re-analize the next(i+1 th) command but
the S bit.
So if the i+1 th command was a previously executed say transmit cmd and
driver sets up now as a say multicast cmd then the card executes
i+1 th cmd with invalid parameters, and the card stall.
Our initial version of the 82559 driver would hang on an Itanium processor
based system because of this problem, but adding a NOP after a
cmd has solved the problem. Now our stress tests run for days without
any problem on 82559.
Hope I could make this clear, if you have any question please feel
free to make a call at 973-443-7469/973-442-0164.
I will try to explain as much as I can.
Regards,
Kallol
>
>
> A quick re-cap of my hardware:
>
> * i82557 quad 64-bit PCI (33Mhz) Ethernet card
> * DEC PC164 Motherboard with 21164 EV56 processor.
>
> I've been messing with eepro100 drivers for about 32 hours straight now
> (with a few hours off for pizza), and as an addednum to my last e-mail,
> this is what I have tried and found thus far:
>
> * The TX-timeout is not dependant on what the card is connected to
> afterall. Regardless of whether it is connected to a 3c905, Bay 350T,
> UB 100-tx hub, or tulip card the "TX-timeout" still happens. The
> timeout just happens a little quicker when connected via X-over to a
> 905b. . .
> * All cabling is tried and true on other network cards.
> * The TX-timout occurs on just about all heavy-traffic. . . the initial
> (initial meaing the first timeout since boot) timeout takes a little
> while to happen, but afterwards the successive time-outs come
> quicker. Here is a quick table of the occurence of the timeouts in
> regards to the different driver versions:
>
> Traffic Driver Version Kernel Version Initial-Timeout(sec)
> Successive Time-outs(sec) Recovery Time(sec)
> heavy NFS read/writes 1.06 2.2.14 25-30 8-10 1-2
> mpeg streaming vis SAMBA 1.06 2.2.14 35-40 12-15 1-2
> HEAVY FTP 1.06 2.2.14 IMMEDIATE 1-2 4-5
> telnet/ssh/http 1.06 2.2.14 NONE - -
> heavy NFS read/writes 1.09 2.2.14 30-45 10-12 8-10
> mpeg streaming vis SAMBA 1.09 2.2.14 115-140 15-20 8-10
> HEAVY FTP 1.09 2.2.14 IMMEDIATE <1 1-2
> telnet/ssh/http 1.09 2.2.14 NONE - -
> heavy NFS read/writes 1.09 2.2.16 30-45 10-12 8-10
> mpeg streaming vis SAMBA 1.09 2.2.16 115-140 15-20 8-10
> HEAVY FTP 1.09 2.2.16 IMMEDIATE <1 1-2
> telnet/ssh/http 1.09 2.2.16 30minutes ??? a long
> time.
> ALL 1.09 2.4.0-test5 N/A*
> *=OS locks IMMEDIATELY after reaching the eepro100 code when compiled in
> the kernel, or upon ismod when running as a module with NO ERROR
> MESSAGES.
>
> MESSAGES:
>
> On v1.06 of the driver, this is what /var/log/messages says:
> Jul 25 09:59:12 fosters kernel: eth0: Transmit timed out: status 0050
> 0000 at 322796/322810 command 000c0000.
> Jul 25 09:59:12 fosters kernel: eth0: Trying to restart the
> transmitter...
>
> On v1.09 of the driver this is what /var/log/messages says:
> Jul 30 03:25:26 fosters kernel: eth0: Transmit timed out: status 0050
> 0c00 at 107640/107670 command 200c0000.
>
> BOOT MESSAGE:
>
> Jul 29 22:39:31 fosters kernel: eth0: OEM i82557/i82558 10/100 Ethernet
> at 0x9000, 00:08:C7:91:08:72, IRQ 17.
> Jul 29 22:39:31 fosters kernel: Board assembly 009542-001, Physical
> connectors present: RJ45
> Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1.
> Jul 29 22:39:31 fosters kernel: General self-test: passed.
> Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed.
> Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed.
> Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed
> (0x24c9f043).
> Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated.
> Jul 29 22:39:31 fosters kernel: eth1: OEM i82557/i82558 10/100 Ethernet
> at 0x9800, 00:08:C7:91:08:73, IRQ 24.
> Jul 29 22:39:31 fosters kernel: Board assembly 009542-001, Physical
> connectors present: RJ45
> Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1.
> Jul 29 22:39:31 fosters kernel: General self-test: passed.
> Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed.
> Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed.
> Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed
> (0x24c9f043).
> Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated.
> Jul 29 22:39:31 fosters kernel: eth2: OEM i82557/i82558 10/100 Ethernet
> at 0xa000, 00:08:C7:66:80:F7, IRQ 28.
> Jul 29 22:39:31 fosters kernel: Board assembly 009545-001, Physical
> connectors present: RJ45
> Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1.
> Jul 29 22:39:31 fosters kernel: General self-test: passed.
> Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed.
> Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed.
> Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed
> (0x24c9f043).
> Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated.
> Jul 29 22:39:31 fosters kernel: eth3: OEM i82557/i82558 10/100 Ethernet
> at 0xa800, 00:08:C7:66:80:0F, IRQ 32.
> Jul 29 22:39:31 fosters kernel: Board assembly 009545-001, Physical
> connectors present: RJ45
> Jul 29 22:39:31 fosters kernel: Primary interface chip i82555 PHY #1.
> Jul 29 22:39:31 fosters kernel: General self-test: passed.
> Jul 29 22:39:31 fosters kernel: Serial sub-system self-test: passed.
> Jul 29 22:39:31 fosters kernel: Internal registers self-test: passed.
> Jul 29 22:39:31 fosters kernel: ROM checksum self-test: passed
> (0x24c9f043).
> Jul 29 22:39:31 fosters kernel: Receiver lock-up workaround activated.
>
> PCI:
>
> There doesn't seem to be any PCI conflicts and I tried both enabling and
> disabling "PCI quirks" in the kernel with no avail. . .
>
> Here is a cat of my /proc/pci:
>
> PCI devices found:
> Bus 0, device 7, function 0:
> PCI bridge: DEC DC21154 (rev 2).
> Medium devsel. Fast back-to-back capable. Master Capable.
> Latency=32.
> Min Gnt=4.
> Bus 0, device 8, function 0:
> Non-VGA device: Intel 82378IB (rev 67).
> Medium devsel. Master Capable. No bursts.
> Bus 0, device 9, function 0:
> VGA compatible controller: Matrox Millennium (rev 1).
> Medium devsel. Fast back-to-back capable. IRQ 19.
> Non-prefetchable 32 bit memory at 0x9000000 [0x9000000].
> Non-prefetchable 32 bit memory at 0x9800000 [0x9800000].
> Bus 0, device 11, function 0:
> IDE interface: CMD 646 (rev 1).
> Medium devsel. Fast back-to-back capable. IRQ 21. Master
> Capable. Late
> ncy=64. Min Gnt=2.Max Lat=4.
> I/O at 0x8000 [0x8001].
> Bus 1, device 4, function 0:
> Ethernet controller: Intel 82557 (rev 5).
> Medium devsel. Fast back-to-back capable. IRQ 17. Master
> Capable. Late
> ncy=32. Min Gnt=8.Max Lat=56.
> Non-prefetchable 32 bit memory at 0xa000000 [0xa000000].
> I/O at 0x9000 [0x9001].
> Non-prefetchable 32 bit memory at 0xa100000 [0xa100000].
> Bus 1, device 5, function 0:
> Ethernet controller: Intel 82557 (rev 5).
> Medium devsel. Fast back-to-back capable. IRQ 24. Master
> Capable. Late
> ncy=32. Min Gnt=8.Max Lat=56.
> Non-prefetchable 32 bit memory at 0xa200000 [0xa200000].
> I/O at 0x9800 [0x9801].
> Non-prefetchable 32 bit memory at 0xa300000 [0xa300000].
> Bus 1, device 6, function 0:
> Ethernet controller: Intel 82557 (rev 5).
> Medium devsel. Fast back-to-back capable. IRQ 28. Master
> Capable. Late
> ncy=32. Min Gnt=8.Max Lat=56.
> Non-prefetchable 32 bit memory at 0xa400000 [0xa400000].
> I/O at 0xa000 [0xa001].
> Non-prefetchable 32 bit memory at 0xa500000 [0xa500000].
> Bus 1, device 7, function 0:
> Ethernet controller: Intel 82557 (rev 5).
> Medium devsel. Fast back-to-back capable. IRQ 32. Master
> Capable. Late
> ncy=32. Min Gnt=8.Max Lat=56.
> Non-prefetchable 32 bit memory at 0xa600000 [0xa600000].
> I/O at 0xa800 [0xa801].
> Non-prefetchable 32 bit memory at 0xa700000 [0xa700000].
>
>
> and there doesn't seem to be any IO issues: cat of /proc/ioports:
>
> 0060-006f : keyboard
> 0070-007f : timer
> 0170-0177 : ide1
> 01f0-01f7 : ide0
> 02f8-02ff : serial(auto)
> 0376-0376 : ide1
> 03c0-03df : vga+
> 03e8-03ef : serial(auto)
> 03f6-03f6 : ide0
> 03f8-03ff : serial(auto)
> 8000-8007 : ide0
> 8008-800f : ide1
> a000000-a00001f : Intel Speedo3 Ethernet
> a200000-a20001f : Intel Speedo3 Ethernet
> a400000-a40001f : Intel Speedo3 Ethernet
> a600000-a60001f : Intel Speedo3 Ethernet
> TRAIL-N-ERROR:
>
> Forcing different interface speeds via mii-diag does not fix anything:
> 100baseTX-FD -- timeout still occurs
> 100baseTX-HD -- timeout still occurs
> 10baseT-FD -- timeout still occurs
> 10baseT-HD -- timeout still occurs
>
> eepro-diag:
>
> eepro100-diag.c:v2.02 7/19/2000 Donald Becker (becker@scyld.com)
> http://www.scyld.com/diag/index.html
> Index #1: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
> at 0x9000
> .
> A potential i82557 chip has been found, but it appears to be active.
> Either shutdown the network, or use the '-f' flag.
> Index #2: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
> at 0x9800
> .
> A potential i82557 chip has been found, but it appears to be active.
> Either shutdown the network, or use the '-f' flag.
> Index #3: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
> at 0xa000
> .
> A potential i82557 chip has been found, but it appears to be active.
> Either shutdown the network, or use the '-f' flag.
> Index #4: Found a Intel i82557 (or i82558) EtherExpressPro100B adapter
> at 0xa800
> .
>
> Chainging MACROS:
>
> v1.06:
> txfifo/rxfifo: changes do nothing
> TX_RING_SIZE/RX_RINGSIZE: changes do nothing
> TX_TIMEOUT: Increasing this number decreases the freqency of the
> timeouts until the number reaches roughly double what it was originally
> set for, then the interfaces are not usable until an ifdown/ifup
>
> v1.09:
> txfifo/rxfifo: changes do nothing
> TX_RING_SIZE/RX_RINGSIZE: changes do nothing
> TX_TIMEOUT: Incresing this number at all makes the interfaces unusable
> until an ifdown/ifup.
>
> Also, I ported the code from v1.09 to v1.06 for the function "static
> void speedo_tx_timeout(struct net_device *dev)" to see what happens --
> the new "hybrid" driver exhibited the characteristics of the v1.09
> timeouts.
>
> Lastly, changing txqueuelen via ifconfig does nothing. . .
>
> Conclusion:
>
> v1.06 of the driver seemed to handle the TX timeouts a quicker then
> v1.09, but in v1.09 they were less frequent. I tried to compile v1.10
> and experimental v1.11, but I got all types of compile errors and did
> not have the motivation to port them to v2.2.16 of the kernel after all
> my above failures.
>
> I have NO IDEA what is causing these TX timeouts. . . if any of the
> gurus here would be as kind as to aide me in my efforts to figure this
> out, I would greatly appreciate it! I will grant accounts on the
> troublesome machine if that will aide in trouble-shooting, and I will
> code whatever I can if anyone can give me a direction to go in. . .
>
> Is there anything special that I have to set in the kernel for 64-bit
> PCI, BTW?
> Could the fact that this card is a 64-bit PCI card be the issue?
> Are there any special parameters that I could try tweaking that are
> alpha-specific?
>
>
> Thank you for any help!!
>
> --Chris
>
> _______________________________________________
> eepro100 mailing list
> eepro100@scyld.com
> http://www.scyld.com/mailman/listinfo/eepro100
>
--
Phone: 973-443-7469
Telnet: 1-443-7469
www.kallolbiswas.com
kallol_biswas@hp.com