[Beowulf] choosing a high-speed interconnect
Mark Hahn
hahn at physics.mcmaster.ca
Tue Oct 12 22:40:03 PDT 2004
> There are multiple 128 node (and greater) IB systems that are stable
> and are being used for production apps. The #7 top500 machine from
I thank you for this street-level information! it's frustrating
to only know a technology based on marketing...
> RIKEN is using IB and has been in production for over six months. My
> cluster at Sandia (about 128 nodes) is being used for IB R&D and
still, 128 nodes is fairly small these days. would you characterize
your applications as fairly bandwidth-intensive? I know that many
of the apps that run on really big weapons-related labs tend to
emphasize latency to an extreme degree, but perhaps your codes are
not like that?
> >300 nodes that are for production use. All run great under Linux, and
> you have multiple IB vendors to choose from (Voltaire, Topspin,
> InfiniCon, and Mellanox).
well, aren't all of those just minor modifications of the same
mellanox chip? that's what I meant by "not-really-multi-vendor".
the IB world would like to compare itself to the eth world,
but it's a very, very long way away from being really vendor-independent.
> Almost all of the IB software development is
> done under Linux first and then ported to other OSes.
very interesting! do you mean user-level IB software and middleware?
I had the impression (circa OLS in July) that there was no real
unification of linux IB stacks, and significant problems with
windows-centricness of the code.
> QP scaling isn't as critical an issue if the MPI implementation sets
> up the connections as needed (kinda of a lazy connection setup). Why
> set up an all-to-all QP connectivity if the MPI implements an all-to-all
> or collectives as tree based pt2pt algorithms.
that sounds reasonable, but does it work out well? I guess it would
depend mainly on whether the actual collective groups change frequently and
are reused.
> Network congestion on
> larger clusters can be reduced by using source based adaptive
> (multipath) routing instead of the standard IB static routing.
interesting, again! in the most recent visit by S&M people from
an IB vendor, they claimed that there was no problem and that any
reasonably smart switch would have a routing manager smart enough
to prevent the non-problem.
> Also remember that IB has a lot more field experience than the latest
> Myricom hardware and MX software stack.
to me, "recent myricom" means e-cards, which I, perhaps naively,
think are more of a known quantity than anything IB. and I haven't
managed to lay hands on MX yet <sniff>.
I'm really glad to hear early adopters of IB speak up; I still claim
that they actually are early adopters, though ;)
regards, mark hahn.
More information about the Beowulf
mailing list