The
CUMULUS Cluster

The Cluster of UniMaginably UseLess uServers

Original Blurb

This is an attempt to build a Beowulf-style compute cluster. It will most likely be the absolute slowest cluster around in the modern age, but that's not the point. The goal here is to setup a platform for learning the ins and outs of parallel systems, especially the programming. I presume setting up the hardware will be non-trivial as well.

To build the cluster, we will be using 16 IBM PS/2 55SX's kindly donated by the Peoria Unified School District, along with some of our own equipment. These machines are all based around a i386SX running at a speedy 16MHz. Hopefully, we can come up with some 387 FPUs for these too (donations are welcome!). From what I hear about the Linux-MCA ESDI drivers, using NFS root will probably be faster and better (for more reasons than speed). So, out with ESDI!

The master node will be made out of my own parts. It will be a i486DX3 running at 99MHz (ok, so in Intel terminology, a 486DX4-100). It will have somebody's SCSI card and a spare Fujitsu 1gb SCSI HD. For interfaces, it will have at least one 3Com 3c509b ethernet. Other interfaces will depend on the interconnect we end up using.

We currently have two options for the inter-node interconnect bus. I think I've got enough parts to do one complete round of 10base2 across the cluster, but no more (and possibly less). Or, I've got enough IBM token ring parts to do pretty much anything I could ever need to do (that includes MCA and ISA cards, MAUs, etc). But, I still don't know which to use. I don't know whether the TR parts I have are 4 or 16Mb/sec. If they're the latter, we'll definitly use at least one TR interconnect, if the former, then the possibility is low (but still an option if I can't find enough coax stuff). [Does the channel bonding stuff work with TR?] We can always go with the hybrid approach: one 10base2 interconnect and one (or more) TR interconnect(s). More decisions to come.

If we use NFS root on the nodes, we could actually use a seperate interconnect to the master node just for the purpose of mounting the filesysten and running apps off of, and leaving another interconnect just for inter-node connections. The filesystem network would almost definitly be TR.

If you've got old stuff laying around that's too slow/old to be usefull to you, bring/ship it over! (I probably will NOT pay shipping on PC stuff, unless I really need it!) Things needed for Echidna: TR cabling, 10base2 cabling/connectors/Tees/etc, 10baseT stuff (anything), more/faster machines, 387 FPUs, etc.

Current Status

02 Aug 1998: Eight (8) CUMULUS nodes are ready hardware-wise. It'll probably take an hour or so total to get the five new nodes booting off fogbow (I must manually boot each one to get the 802.5 address off of the TR card, then put it in /etc/bootptab, and run makenode). Past that, all that's left is to install PVM and then we're ready for parallel apps! Very soon. I'd say end of this week max.

23 Jul 1998: Ok, so I finally just stopped complaining and did it. There's now three nodes booting and running (node000, node001, node002). The bootable NFS rootfs's for each were generated using an automatic script, so adding more is just as simple as running a few commands and adding the MAC to bootptab. They're really not that slow. I'm impressed. Now I just need to get some of the distributed API libs running so we can start running a real Beowulf here. Soon. Real soon. BTW, yes we are running token ring as both the boot medium and the message-passing medium. It's running at 16Mb/sec timing at this point. AFAIK, this is the first Beowulf to ever use a token ring as a backbone (all others, of course use 10[0[0]]baseT[X], Myrinet, etc).

20 Jul 1998: I'm officially "rebirthing" this cluster under the new name of CUMULUS Cluster, along with moving the cluster to a new master node, fogbow. fogbow itself will be getting upgraded a little here in the future. Also, I may move the master node to something else once we get going (possibly even to the new SPARCstation 2). Thats it until further notice. I'm working on the cluster in between compiles of gtkFAIM, and I'm not exactly the best multitasker around (that, of course, would be Linux itself).

3 Jul 1998: It felt like it was time to make a comment again. Summer school did end. Brock did come back. I did fix the token ring problems. I did get more ISA token ring cards. The problem: either I rip the 3Com TokenLink from ihpled or I run the cluster ring at 4Mb. The ISA cards I have don't run at more than 4Mb (well, one of them does, but I don't have my driver even started for that one yet). Hence, the former solution does seem to be the better one. Once again, I'm waiting on hardware. I don't like bringing down a server for just one thing, so I'm waiting for the EISA VGA card and the other EISA SCSI card to come so that I can do that for ihpled too. Three changes sounds like a lot better reason than one. Also, I think I'm going to make the master node fogbow instead of my previous plan. Esp since there's an EISA motherboard coming to put in fogbow. I'd like to put the wierdest and most archane hardware I can find into this cluster. I think I'm succeeding quite well in that goal so far. Also, I've got 10 Madge MCA token ring cards here that are supposedly super-high-performance. That is, if you have a driver for them. They seem to be based on the same chip as the one mentioned above, so a driver might just be easy. See ya.

9 Jun 1998, 20:10: I stare at my comments below and wonder. That is, if I'm ever going to get this working. Brock will be back a week from tomorrow and I will be having the last day of summer school on the same. Hopefully this weekend... First, I must get at least one of these ISA token cards going in the master node. I stole the 3Com to use in ihpled, so currently, the main node has no connection to the rest! Eeek! Must get that fixed... Someday, someday...

24 May 1998, 23:20: Well. It looks like the hardware is pretty much planned out. Since we don't have enough ethernet equipment at all, we shall be using token ring (yea! I've really begun to enjoy token ring -- much more amusing than ether!) I've got two 8-port (each) 8228-clone MAUs (made by Andrews) and one [really cool] Thomas-Conrad TC4050 (16-port) MAU. So, we have enough ports to satisfy 2 complete 16Mb/sec 16-node rings, if we so desire. And we've got plenty of token ring cards, of coarse. Since we can use CAT3 wiring, that's no problem either. I've got ALL the machines here (in the garage, I'll have pictures up in a few days). They're not stacked up in a usable way yet -- I've not found the space to do that yet -- so globally, that's what we're waiting on. At least that's what I'd like to hear. What I'm really waiting on is getting a decent bootable nfs tree going on the master node so all the nodes can boot off of it. [Please let one thing be clear at this point: I HATE REDHAT!] Enough yet? I think that's it for now. Oh, BTW, schools over. Oh, and BTW, [summer] school starts on 1 Jun 98...in a week. [Hopefully] some progress will come in the next three months. I'm told we can keep the machines until PUSD needs them back or the end of summer vacation, whichever comes first (last?).


Adam Fritzler
Last modified: Sun Aug 2 02:02:06 MST 1998