Sunday, December 2, 2012

Redhat Linux Cluster on RHEL 6.3

Today i am posting a two node Redhat Active-Passive failover Cluster on RHEL 6.3 that i got to setup and it works like a charm. I was really impressed with the way Redhat Cluster suite works and ease of installation.


There are two critical part for setting up a cluster both are relatively important

====================================
Part 1 - Steps to do before starting the installation:
====================================

Make sure that you have all the Redhat Cluster suite package is installed.

Add the host entry on the /etc/hosts on both the nodes and set a passwordless ssh that helps for copying the files
[root@node01 ~]#cat /etc/hosts
192.168.10.2    node01.test.com        node01
192.168.10.3    node02.test.com        node02

You will have to configure the interconnect in such a way that they communicate in multicast frames from eachother and please run the below command simultaneously.

[root@node01 ~]#omping 192.168.10.3 192.168.10.2
192.168.10.3 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.10.3 :   unicast, seq=1, size=69 bytes, dist=0, time=0.131ms
192.168.10.3 : multicast, seq=1, size=69 bytes, dist=0, time=0.174ms

[root@node02 ~]# omping 192.168.10.2 192.168.10.3
192.168.10.2 : waiting for response msg
192.168.10.2 : joined (S,G) = (*, 232.43.211.234), pinging
192.168.10.2 :   unicast, seq=1, size=69 bytes, dist=0, time=0.224ms
192.168.10.2 : multicast, seq=1, size=69 bytes, dist=0, time=0.269ms

Make sure the below services are switched off on both the nodes, so that you can avoid troubleshooting to where was the problem.

[root@node01 ~]#chkconfig ip6tables off
[root@node01 ~]#chkconfig iptables off
[root@node01 ~]#chkconfig acpid off

and the selinux should be disabled on both the nodes
[root@node01 ~]# cat /etc/selinux/config |grep -i ^SELINUX=
SELINUX=disabled

Add the entry in /etc/httpd/conf/httpd.conf (node01, node02)
Listen 192.168.10.10:80

Add the cluster service to start it on the runlevel during the startup
[root@node01 ~]#chkconfig ricci on
[root@node01 ~]#chkconfig cman on
[root@node01 ~]#chkconfig rgmanager on

Set a password for the ricci user
[root@node01 ~]#passwd ricci

Reboot both the nodes to make the changes take effect.

========================
Part 2 - Starting with Installation:
========================
I have connected both the nodes to a EMC clariion and mapped two luns of which one of them i will be using for fencing.

[root@node01 ~]# powermt display dev=all |grep emcpower
Pseudo name=emcpowerb
Pseudo name=emcpowera

Start the ricci service before you start to configure the cluster.
[root@node01 ~]#service ricci start
Starting ricci: [ OK ]

*Create a basic cluster config file and name your cluster some name
[root@node01 ~]#ccs -h node01 --createcluster webcluster

* Add both the nodes on the cluster
[root@node01 ~]#ccs -h node01 --addnode node01 --votes=1 --nodeid=1
[root@node01 ~]#ccs -h node01 --addnode node02 --votes=1 --nodeid=2

* Here we will set the fence daemon properties which will let the cluster waits 0 sec before fencing the other node and the other parameter will set the cluster waits for 30 sec before fencing a node after it joins back
[root@node01 ~]#ccs -h node01 --setfencedaemon post_fail_delay=0 post_join_delay=30

* We will set the cman daemon properties for two node cluster and a vote so that the services will run in one node if the other fails
[root@node01 ~]#ccs -h node01 --setcman two_node=1 expected_votes=1

* Add a fence method which will be used by the nodes
[root@node01 ~]#ccs -h node01 --addmethod scsi node01
[root@node01 ~]#ccs -h node01 --addmethod scsi node02

* Adding the fence device, cluster agent & the log file which helps for troubleshooting
[root@node01 ~]#ccs -h node01 --addfencedev scsi_dev agent=fence_scsi devices=/dev/emcpowera logfile=/var/log/cluster/fence_scsi.log aptpl=1

* Add a fence inst which the cluster uses during the starting up
[root@node01 ~]#ccs -h node01 --addfenceinst scsi_dev  node01 scsi key=1
[root@node01 ~]#ccs -h node01 --addfenceinst scsi_dev  node02 scsi key=2

* Add a fence inst which the cluster uses during the stopping
[root@node01 ~]#ccs -h node01 --addunfenceinst scsi_dev  node01  key=1 action=on
[root@node01 ~]#ccs -h node01 --addunfenceinst scsi_dev  node02  key=1 action=on

* Create a failover domain and add the nodes in that, with setting it to failover in ordered way by setting priorities for each node
& nofailback if the node with higher priority comes back online after failover
[root@node01 ~]#ccs -h node01 --addfailoverdomain web-failover ordered=1 nofailback=1
[root@node01 ~]#ccs -h node01 --addfailoverdomainnode web-failover node01 1
[root@node01 ~]#ccs -h node01 --addfailoverdomainnode web-failover node02 2

* Create a services under the cluster
[root@node01 ~]#ccs -h node01 --addservice web domain=web-failover recovery=relocate autostart=1

* Now we will add all the resources in the Global cluster config which the services will use when its starting, I am going to add three services a SAN device which will get mounted when the cluster starts, an virtual ip that will set the ip and the apache service on the Active node.

[root@node01 ~]#ccs -h node01 --addresource fs name=web_fs device=/dev/emcpowerb1 mountpoint=/var/www fstype=ext3
[root@node01 ~]#ccs -h node01 --addresource ip address=192.168.10.10 monitor_link=yes
[root@node01 ~]#ccs -h node01 --addresource apache name=apache_server config_file=conf/httpd.conf server_root=/etc/httpd shutdown_wait=10

Then add the subservice on the cluster in the ordered way so that the filesystem get mounted then the ips is configured and then the apache service is started on the Active cluster node.
[root@node01 ~]#ccs -h node01 --addsubservice web fs ref=web_fs
[root@node01 ~]#ccs -h node01 --addsubservice web ip ref=192.168.10.10
[root@node01 ~]#ccs -h node01 --addsubservice web apache ref=apache_server

* Copy the cluster config on both the nodes
[root@node01 ~]#cman_tool version -r
[root@node01 ~]#ccs -h node01 --sync --activate

* Start the cluster service
[root@node01 ~]#ccs -h node01 --startall

[root@node01 ~]# cman_tool nodes
Node  Sts   Inc   Joined               Name
   1   M    112   2012-11-30 23:44:28  node01
   2   M    116   2012-11-30 23:44:28  node02

[root@node01 ~]# ccs -h node01 --lsnodes
node01: votes=1, nodeid=1
node02: votes=1, nodeid=2

[root@node01 ~]# ccs -h node01 --lsfencedev
scsi_dev: logfile=/var/log/cluster/fence_scsi.log, aptpl=1, devices=/dev/emcpowera, agent=fence_scsi

[root@node01 ~]#ccs -h node01 --lsfailoverdomain
web-failover: restricted=0, ordered=1, nofailback=0
  node01: priority=1
  node02: priority=2

[root@noe01 ~]# ccs -h node01 --lsservices
service: name=web, exclusive=0, domain=web-failover, autostart=1, recovery=relocate
  fs: ref=web_fs
  ip: ref=192.168.10.10
  apache: ref=apache_server
resources:
  fs: name=web_fs, device=/dev/emcpowerb1, mountpoint=/var/www, fstype=ext3
  ip: monitor_link=yes, address=192.168.10.10
  apache: name=apache_server, shutdown_wait=10, config_file=conf/httpd.conf, server_root=/etc/httpd

[root@node01 ~]# clustat
Cluster Status for webcluster @ Sun Dec  2 21:58:31 2012
Member Status: Quorate

 Member Name                                                     ID   Status
 ------ ----                                                     ---- ------
 node01                                                              1 Online, Local, rgmanager
 node02                                                              2 Online, rgmanager

 Service Name                                                     Owner (Last)                                                     State
 ------- ----                                                     ----- ------                                                     -----
 service:web                                                      node01                                                           started

The /etc/cluster/cluster.conf  can be edited manually but we just have to make sure that the version no is changed and run ccs_config_validate to check the validity of the config file.

[root@node01 ~]# ccs_config_validate
Configuration validates
[root@node01 ~]#cman_tool version -r
[root@node01 ~]#ccs -h node01 --stopall
[root@node01 ~]#ccs -h node01 --startall


Hope this document was helpful!!

No comments:

Post a Comment