[Beowulf] SATA II - PXE+NFS - diskless compute nodes
Donald Becker
becker at scyld.com
Tue Dec 12 16:34:32 PST 2006
On Sat, 9 Dec 2006, Mark Hahn wrote:
> >> I would hazard that any DHCP/PXE type install server would struggle with
> >> 2000 requests
>
> a single server (implying 1 gb nic?) might have trouble with the tftp part,
> but I don't see why you couldn't scale up by splitting the tftp part
> off to multiple servers. I'd expect a single DHCP (no TFTP) would be
> plenty in all cases. 100 tftp clients per server would probably
> be pretty safe.
Some of the limits you encounter aren't solved by multiple machines
running TFTP servers, but can be solved by a single clever TFTP server.
TFTP is subject to something like the Ethernet "capture effect", where
once a machine misses a packet, it's increasingly likely to continue to
fail. And with most PXE clients, failure is fatal. So you want to avoid
any TFTP retry, even if that means deferring the response to other
clients when you detect a retry attempt.
Another problem is that PXE clients seem to have some corner cases with
ARP. It's best not to re-ARP during a download, even responding to an
external request if some other machine is trying to ARP your client.
> I personally like the idea of putting one admin server in each rack.
> they don't have to be fancy servers, by any means.
Any installed machine is added complexity. And these are machines you
have to keep consistent with potentially many boot images. (Imagine cases
where you are detecting the hardware and serving the proper image.)
> > There are a few modifications you have to make to increase the number of bootps before
> > it fails.
>
> do you mean you'd expect load problems even with a single sever
> dedicated only to dhcp?
That only seems unlikely until you pair a script interpreter with the DHCP
server. A default backlog of only 25 packets seems tiny when you are
running scripts that make SQL queries before responding. But NO ONE would
do that, right? Right?
> > So now to figure out my next step. I will need local space for logs
> > and data/temp data files.
>
> why would you want logs local?
You want your kernel messages to be logged to the same machine that
is serving your kernels. Which should be the same server that provides
the kernel modules and modprobe tables that match the kernel. And you
want the logging to happen as the very first thing after booting the
kernel. (Boot kernel, load network driver, DHCP for loghost, dump kernel
message, only then activate additional hardware and do other risky
things.)
--
Donald Becker becker at scyld.com
Scyld Software Scyld Beowulf cluster systems
914 Bay Ridge Road, Suite 220 www.scyld.com
Annapolis MD 21403 410-990-9993
More information about the Beowulf
mailing list