Tuesday, June 24, 2008

Redhat Cluster 4 how-to

Redhat Cluster 4: Steps for setting up a 2 node cluster

* This document assumes that you’ve read the pdf found easily at Redhat’s site in the documentation section. It just sort of condenses it all for you if you want to make a 2 node cluster.
**Make sure you read all the tips at the very bottom or you could be in for some pain

A) What we want
1. Two servers in an active/passive cluser (one fails, the other takes over)
2. A shared storage area between them (disk array, luns from a SAN, etc.)
3. Floating IP
4. Bonded Ethernet interfaces (two interfaces on the same system acting as one highly available interface).
5. Power fencing (if system A appears unreachable, system B turns system A off and takes over the cluster).

B) Setup interface bonding (fairly easy)
1. I documented this at: http://askraina.blogspot.com

C) Startup gui
1. service ccsd start
2. service cman start
3. export DISPLAY=xxx.xxx.xxx.xxx:0
4. system-config-cluster &

D) Configure your cluster (in all steps, “close” saves your progress in the setup screens. “file->save” saves your cluster.
1. When you first start you’ll be asked to create a new cluster – do so.
2. I chose DLM (distributed lock manager) because GULM requires 3 or 5 servers in your cluster.
3. Name your cluster something nice.
4. Create a node by clicking on “cluster nodes” and then clicking “add nodes”.
Choose a quorum vote of 1.
5. Create a fence device. I chose HP’s ILO choice. This require me to put AS the hostname the hostname of the actual ILO, not the hostname of the node in the cluster. I “named” my fences arbitrarily.
6. Assign those fences to the nodes by clicking on the nodes created in step 4 and clicking “manage fencing for this node”. When you do, a window pops up and you’ll click “add a new fence level”. Then you’ll click on that level (probably level 1 if you’ve just started) and click “add a fence to this level”. Then you choose the fence created in step 5 to the appropriate node.
7. Click “failover domains” and create a new failover domain. Use any name you want. I recommend unordered priority and unrestricted – let any node in the cluster run the cluster. Fool with it later if you have time.
8. Resources: resources are things like “IP addresses” of “shared filesystems” or “scripts”. Generally, your “scripts” will be the ones in /etc/init.d (httpd, for example). This part of the setup is straight forward. I’m pretty sure I had to put my IPs in /etc/hosts, but I’m not sure if that’s what made it work or not. BE VERY AWARE: “ifconfig –a” may or may not show eth0:1 or the like. I’m not sure how redhat does it or if it is a bug, but both of my 2.6.9-11 ELsmp kernels brought up the IPs but didn’t bring up the interface. Also, this was clearly a bug: I couldn’t create one particular IP address for the life of me (10.x.x.25). I then tried doing .98 and it worked fine. Something got “hosed” up in the plumbing probably.
9. Create services: create an arbitrary name for your service (whatever you like), and add resources to it. Order matters – IP’s and filesystems first and scripts last because they needed the other two. Also, you can nest your stuff. It appears that the top most layer is the base layer, and the lower layers are the things that rely on the base layer. Either way, it’s a little buggy in my opinion so I didn’t layer anything. It seems to work just laying them all down 1 after the other, from most basic to most complex. When you are ready, assign this service to a failover domain.
10. Save your script (goes to /etc/cluster/cluster.config)
11. Bring the other node into the cluster:
i. # service ccsd start (on the other node)
ii. # service cman start
iii. On the original system, bring up that gui again. This time you’ll see a management console tab and button in the top right corner which reads “Send to cluster”. The button saves the config to /etc/cluster/cluster.config. If there’s already one there (and there should be) it moves it to a backup file first and then saves. The last thing the button does is shoots the config over to the other system. If you need to, ftp will also do the trick but you shouldn’t need to.
iv. Exit the gui.
12. Start the cluster
i. On both systems, 1 node first and then the other, run the other two daemons:
ii. # service fenced start
iii. # service rgmanager start
13. Check the cluster
i. # clustat (you should see ‘stuff’)
14. If the cluster isn’t started
i. Go into the gui, go to the management tab and click on the service, then “enable” it. If it is in a failed state and won’t start, take down everything with all the “service x start/stop” commands and bring everything up. If it still doesn’t work, do some basic unix troubleshooting (permissions, groups, paths to resources/scripts, does it really mount like you think it will, is there an ip conflict, etc.) If that doesn’t work you’re in for the long haul…

Stuff that took me forever to figure out:

1) bonding was easy on one server that had a very up2date kernel. The one that was slightly behind in its upkeep had problems – bonding came up but we had a ton of kernel errors that I didn’t have time to really figure out, so I just upgraded the kernel – it worked.

2) As I mentioned above, Redhat Cluster didn’t (maybe doesn’t) make a virtual interface of the “ethx:1” variety. So don’t waste hours looking for it.

3) HEAR ME HEAR ME: Every start/stop script in Redhat Cluster requires a “status” option (eg., /etc/init.d/mysql.server status). If you don’t have one, redhat cluster will keep bounding your service. You’ll have to put a status check in there that returns 0 (zero).

4) Make sure 127.0.0.1 is only named “loopback” and not your server name. Make your servername a useful IP. You should also put any other names you can think of in your hostfile (like your fences).

5) HEAR ME HEAR ME: if you enable fencing and you are having problems with its stop/start procedures hanging, don’t reboot. Your system will hang as its coming up, forcing you to bring it up manually 1 process at a time, hitting No at that process.

6) The default log location for Redhat Cluster is /var/log/messages. Tail –f that file and grep for “clu” and you’ll see all the cluster-related messages.

7) If you ever get complaints from the gui at startup about XML syntax errors, well, it means that you screwed something up. I know, I can’t believe it either – the gui allows you to make impossible entries into your XML file. No matter how much you think you are doing it right, trust me, you messed up – and the gui let you.

8) As you struggle to get things going, ALWAYS check your services using “ps” and ALWAYS check to see if what you expect to be mounted or unmounted are the way you expect. Until you get everything right, you have to babysit your system – you literally could get the same filesystem mounted on two boxes, and services trying to start them. Its disgusting.

9) If you’re like me and you don’t have a CDROM connected to your proliant blade server and only have an iso, and you mount that for your install, you’re going to be asked to insert one of your Redhat Linux install disks (or you might). This is nuts. It auto-ejects your iso, and suddenly you need to put a disk in – which it expects to auto-mount for you when you click “ok”. The workaround is to copy everything from the iso to disk, and delete/move all rpms from the rpms directory that don’t pertain to your specific type of kernel (smp, bigmem, and the like). Then install all the rpms with a * as an argument – it worked for me.

10) HEAR ME HEAR ME HEAR ME: If you find that when you simulate a network failure to make the cluster failover (eg., ifdown eth0), and all you see is "CMANsendmesg failed: -101", then here's the problem: your power-fencing system is sending "poweroff" to the server, but the "acpid" service is interpreting it as "shutdown -h", which won't allow the server to come down unless its done gracefully. You need to go into /etc/acpi/events and change the config flie, then hup the daemon (/etc/init.d/acpid stop/start). The config file might be named "sample.conf" or something, that's fine - it'll use that (man acpid).

No comments: