[tulip] race condition leading to hung tx queue in .92, .93
Donald Becker
becker@scyld.com
Tue Feb 19 12:15:01 2002
On Fri, 15 Feb 2002, Chris Friesen wrote:
> We have discovered a race condition that could lead to a hung tx queue
> in the .92 and .93 drivers. Near the end of tulip_start_xmit(), there
> is the following code:
>
> if ( ! tp->tx_full)
> netif_unpause_tx_queue(dev);
> else
> netif_stop_tx_queue(dev);
> The problem occurs if we fail the check and then before running the
> else clause get interrupted by tulip_interrupt(), which then cleans up
> enough send packets that it clears tx_full and tbusy. The interrupt
> handler returns, and we proceed to set tbusy. At this point we're
> left with tbusy set, and tx_full cleared, and the driver never
> recovers.
Yes. The Tulip driver has a different structure than most of the other
PCI netdrivers, and thus the check for full->empty race that is
implemented in pci-skeleton.c did not apply.
> The fix is to change this code to the following:
>
> if ( ! tp->tx_full)
> netif_unpause_tx_queue(dev);
> else {
> netif_stop_tx_queue(dev);
>
> /* handle case of tulip_interrupt() running under our feet */
> if ( ! tp->tx_full)
> netif_start_tx_queue(dev);
> }
Correct, although the preferred call is
netif_resume_tx_queue(dev)
The
netif_start_tx_queue(dev);
call currently does the same thing, but is intended to be used when the
interface is first started.
Donald Becker becker@scyld.com
Scyld Computing Corporation http://www.scyld.com
410 Severn Ave. Suite 210 Second Generation Beowulf Clusters
Annapolis MD 21403 410-990-9993