more NetGear mising ARPs
Neale Banks
neale@lowendale.com.au
Sat Jun 12 09:49:53 1999
Hi Keith,
On Sat, 12 Jun 1999, Keith Owens wrote:
> On Fri, 11 Jun 1999 21:28:46 +1000 (EST),
> Neale Banks <neale@lowendale.com.au> wrote:
> >As others have suggested, it's as though the arp reply is either not seen
> >by the card or not delivered up by the driver.
>
> Does switching the card in and out of promiscuous mode have any effect?
> "ifconfig eth0 promisc", try the Cisco, "ifconfig eth0 -promisc", try
> Cisco again.
In failing to recreate this problem earlier today, I think I have
corroborated an earlier observation of <bart@vianet.net.au>:
"It seems to only fail if there is net activity on the box Im
trying to ping even if its small say even 5-10k a sec,"
This morning (being Saturday) I just could not get this problem to perform
- then I remembered the above and that there isn't much traffic in the
office on a Saturday. This evening I have now had it perform (i.e.
incomplete ARP) 5/5 times by first setting up a 1400 byte ping to the
Cisco from the other Linux host on the subnet. Anyone hazard a guess as
to the significance of this? It also possibly means that on a switch you
may be far less likely to see this problem (and with a 100Mb card there's
a fair chance of being on a switch?).
Regarding switching in and out of promiscuous mode:
I've now booted a 6th time and got the incomplete ARP again. Putting eth0
into promiscuous mode cleared the problem straignt away, "arp -d"ed the
entry and it re-appears on (successful) ping of the Cisco; put the card
back out of promiscous mode and the ARPing appears to remain happy. You
may be on to something here :-)))
7th reboot: initially "incomplete" ARP; setting promisc didn't immediately
help, I deleted the "incomplete" entry and could happily ping the Cisco;
turned off promisc, deleted the ARP entry and can ping the cisco (but
there was a brief appearance of the incomplete ARP entry - could that be
just that this ARP took a while to complete?).
For the record, the 1400-byte ping was still running through all of this
(seq# > 4000 now).
> If that works, it could be the problem I saw with Xircom RBEM56G. The
> TX/RX rings get confused when CSR6 is changed too quickly and the MAC
> filters are not setup correctly. If promisc on/off works, try tulip.c
> from ftp://ftp.ocs.com.au/pub/xircom-RBEM56G-howto-2.tar.gz. This
> patched tulip 0.91 is mainly intended for RBEM56G but the CSR6 fix
> should work on other tulip cards, set strict_csr6=1.
OK, grabbed that and made and installed a new kernel-image package (tulip
as a module). Re-started the 1400-byte ping; test reboot - phew, it came
up OK and (unsurprisingly) still has the incomplet ARP problem. Changed
/etc/modules so that the tulip line is "tulip strict_csr6=1". Rebooted
and I still have the incomplete ARP :-(
OK, just to be sure, I've changed
static int strict_csr6 = 0;
to:
static int strict_csr6 = 1;
in tulip.c, and rebuilt. Rebooted and still incomplete ARP. {:-(
Checking dmesg: yes we have "tulip.c:v0.91 4/14/99
becker@cesdis.gsfc.nasa.gov (modified by danilo@cs.uni-magdeburg.de for
XIRCOM CBE, kaos@ocs.com.au for RBEM56G (-2))". Also, setting promiscuous
mode still fixes it and resetting out of promiscuous mode still allows
successful ARP.
One final cross-check: I've stopped the 1400-byte pings and am rebooting -
up it came and can ARP the Cisco happily.
What else can we try?
Thanks,
Neale.