IDE disk errors
J. G. LaBounty
jgl at unix.shell.com
Sat Jun 16 07:40:51 PDT 2001
Mike,
It might not have been clear in my note but the promise cards are only on the
Tyan 2500 motherboard. The other systems do not have them although we are seeing our
problem on all the systems except the one with the Western Digital drives.
As a test we have moved the Western drives to the Tyan motherboard with
the promise
card. We have been running that way for a little more than a day and
have not seen
any errors.
On the other Tyan boards with IBM drives on the Promise card, we have
set noapic
per your suggestion. So far only one system has seen the error, which
is still a
major improvement. We will continue to run this way for at least the weekend.
The noapic did not seem to work on the Supermicro 370dle. We started
seeing a error
"stuck on TLB IPI wait (CPU#0)". This error would occasionally break
free but most
times would hang the node. We have backed the change out for the Supermicro.
As another "let's see what happens" change, we have moved swap to a
disk by itself.
The reason for this is that 90% ( guess) of the disk errors were on the
disk with
the swap. We moved swap to the second disk and sure enough, we started
seeing more
disk errors on the second disk. We are not sure what this means but the
trend was
there for systems with and without the Promise card.
> From: Michael Prinkey <mprinkey at aeolusresearch.com>
> X-Mailer: Mozilla 4.77 [en] (Windows NT 5.0; U)
> X-Accept-Language: en
> MIME-Version: 1.0
> To: Mark Hahn <hahn at coffee.psychology.mcmaster.ca>, beowulf at beowulf.org
> Subject: Re: IDE disk errors
>
> Hi Mark,
>
> I don't understand the issue either. I think that the Promise driver is
> not quite ready for prime time, or the card itself is introducing an
> interrupt handling problem. I have built several systems as storage
> servers and used the Promise controllers to install eight or more hard
> drives into single systems. Under low intensity usage, there were no
> errors, but as soon as I started pushing a lot of data onto those
> drives, those DriveReady SeekComplete Errors started showing up.
>
> I am anxious to understand this as well.
>
> Regards,
>
> Mike Prinkey
> Aeolus Research, Inc.
>
>
>
> Mark Hahn wrote:
> >
> > > Thanks for your input. We just this morning booted our 50 node
> > > Supermicro cluster
> > > with the noapic option. I will post to the group if it solves our
problem.
> >
> > noapic does effect the delivery of irq's, but it certainly
> > does not cause ide disks to report hard/media errors.
> >
> > I suspect the problems reported were actual bad sectors:
> > IBM has apparently had some well-reported QA problems on DTLA's recently.
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
John
*------------------------------*------------------------------*
| J. G. LaBounty | Shell Services Int. |
| voice 713-245-2024 | E&P Information and Computing|
| email jgl at shellus.com | 1500 Old Spanish Trail 6Q25 |
| fax 713-245-2659 | Houston, Texas 77054 |
*------------------------------*------------------------------*
------- End of Unsent Draft
cc:
Fcc: +jgl
SendWidth: 132
Subject:
John
*------------------------------*------------------------------*
| J. G. LaBounty | Shell Services Int. |
| voice 713-245-2024 | E&P Information and Computing|
| email jgl at shellus.com | 1500 Old Spanish Trail 6Q25 |
| fax 713-245-2659 | Houston, Texas 77054 |
*------------------------------*------------------------------*
More information about the Beowulf
mailing list