Note: 1)
Blog is intended for people with good knowledge of Linux as OS and shell as a tool.
2)
Steps are extracted from RHEL 6.x HA implementation
Like every HA/clustering technology, making any service (DB,NFS,web server or any custom script) is accomplished by going through
1- Configuring two servers in cluster / Preparing the machine/servers to look for each other
2- Configuring the required service to fail over in event of some hardware/network failure.
#################### 1 #####################
For Linux, list of per-requisites is given as:
i.
Backup existing /etc/yum.repos.d/rhel-source.repo file, and edit
this file as shown below.
[rhel-local-base]
gpgcheck=1
name=Red Hat Enterprise Linux $releasever - $basearch - DVD
baseurl=file:///mnt
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
[rhel-local-HA]
gpgcheck=1
name=Red Hat Enterprise Linux $releasever - $basearch - DVD
baseurl=file:///mnt/HighAvailability
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
[rhel-local]
gpgcheck=1
name=Red Hat Enterprise Linux $releasever - $basearch - DVD
baseurl=file:///mnt/LoadBalancer
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
[rhel-local-RS]
gpgcheck=1
name=Red Hat Enterprise Linux $releasever - $basearch - DVD
baseurl=file:///mnt/ResilientStorage
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
[rhel-local-SF]
gpgcheck=1
name=Red Hat Enterprise Linux $releasever - $basearch - DVD
baseurl=file:///mnt/ScalableFileSystem
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
[rhel-local-Server]
gpgcheck=1
name=Red Hat Enterprise Linux $releasever - $basearch - DVD
baseurl=file:///mnt/Server
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-redhat-release
and run
rpm –import <location_on_DVD>/RPM-GPG-KEY-redhat-release
I assumed that you have DVD (Linux 6x in my case) in DVD drive.
ii- Install following software
a. cman
b. rmanager
c. luci
d. ricci
e. GFS
f. clvmd
OR
Alternatively you can use
#yum groupinstall "HighAvailability"
#yum groupinstall "
LoadBalancer"
#yum groupinstall "
ResilientStorage"
#yum groupinstall "
ScalableFileSystem"
Remember to restart yum after adding above repos.
iii- you must have 5 IPs and 4 NIC interfaces available for the systems.
iv- Next is to configure services
a) disable and turn off iptables with
#chkconfig iptables off and
#service iptables stop
b) disable and turn off selinux with
#chkconfig selinux off and
#service selinux stop
c) enable and run luci with
#chkconfig luci on and
#service luci start
d) enable and run ricci with
#chkconfig ricci on and
#service ricci start
e) enable and run rmanager with
#chkconfig rmanager on and
#service rmanager start
f) enable and run clvmd with
#chkconfig clvmd on and
#service clvmd start
Note: After luci is started, it should be accessible with
https://hostname_or_IP:8084. Sample is given as:
v- Reset luci's password - in case!
vi- Next step is create fence - in my case i have used HP's iLO as a fence. following url can help you configure the fence.
http://h10025.www1.hp.com/ewfrf/wc/document?cc=us&lc=en&dlc=en&docname=c03138211
Use luci's GUI to create fence rather editing cluster configuration as presented in HP's link.
vii- System's /etc/hosts should look like below and NICs on both hosts should be configured accordingly.
192.168.0.1 labnode1
192.168.0.2 labnode2
192.168.0.3 service_VIP
192.168.0.4 labnode1_hb
192.168.0.5 labnode2_hb
viii- Run and configure cman for start as
#chkconfig cman on; service cman start
after this onward, both systems are ready to be configured/bundled in a cluster - you name it.
ix- login luci and add nodes and then create cluster
on shell #clustat command can acknowledge the cluster status. systems are ready to be configured to run your service (in this case Oracle) in HA configuration.
Note: when adding nodes in luci, use
labnode1_hb and labnode2_hb as your cluster node names.
You put heartbeat link node name. This is how REHL HA
uses heartbeat link to check each
other’s health, therefore determine whether
fencing each other in case of the other node
fails to responds
################ 2 #################
For any service to be configured in cluster with HA/LB functionality, one should understand / have the followings:
a. Virtual IP / Virtual host name for service
b. Some shared storage with clvmd or GFS
c. Way to start the application - could be some predefined template in cluster tool or your
customer script
d. Way to stop the application - could be some predefined template in cluster tool or your
customer script
e. How and what you want to monitor for service fail-over
in my case, VIP/Virtual_hostname is already configured in /etc/hosts. i have /u01 and /u02 as a choice for shared storage, i'll be using predefined Oracle option in luci to create HA instance of oracle with listener. Monitoring and start/stop will be handled by the same.
In Linux a service that is going to be configured is on top of a cluster and fail-over domain. service is called a "service group" and above mentioned things are called service group "resources". From one tool to the other, you may find different names. For example in service guard, a service group is called a package and all the resources are pre-reqs to build a package. in VERITAS SF, name differs slightly - but concept remains the same. i.e.
you should have one cluster - then one fail-over domain (not present in HP or VERITAS etc.) then a service group running on top of that fail-over domain with certain resources (IP, shared storage, start/stop scripts and monitoring).
With sufficient knowledge you can translate above in HP's, SUN's (now oracle) or VERITAS version.
Lets configure oracle as HA service in Redhat cluster services; configured above.
i- Verify you have your /etc/hosts has VIP entry (already done when configuring NIC interfaces.)
ii- Three slices have been presented to both the servers. Two disks are shared for DB data and
quorum disk and one for oracle binary installation. you may put oracle binaries on some shared
storage to avoid installation on secondary nodes but you 'll have configure oracle user and home
directories accordingly.
Create storage as:
1. #
pvcreate /dev/mapper/dm-1
2. #
vgcreate -cy vgdata/dev/mapper/dm-1
3. #
lvcreate -L 10G -n lvdata vgdata
4. #
mkfs.ext4 /dev/vgdata/lvdata
5. #
vgchange -an vgdata
6. #
pvcreate /dev/mapper/dm-2
7. #
vgcreate vgapp/dev/mapper/dm-2 #Note for local -cy is missing
8. #
lvcreate -L 10G -n lvapp vgapp
9. #
mkfs.ext4 /dev/vgapp/lvapp
10. # vgchange -an vgapp
11.
# pvcreate /dev/mapper/dm-3
12. # vgcreate -cy vgqrm/dev/mapper/dm-3
13. # lvcreate -L 2G -n lvqrm vgqrm
14.
# mkfs.ext4 /dev/vgqrm/lvqrm
15.
#
vgchange -an vgqrm
Create mount points on root as #mkdir /u02
iii- Access luci GUI add a failover domain - in my case its OracleHA. Failover domain is may not
be prioritized or restricted.
iv- Access the resources tab in luci and starting adding resources:
a). Add IP resources (service IP; the same IP that was given in /etc/hosts as VIP); in my case i
t
is 192.168.0.3
b). Add LVM HA resource. Provide resource name, and quorum disk VG name vgqrm and LV
name as lvqrm.
c). Add filesystem resource. Provide resource name (database in my case), mount point (/02),
filesystem (ext4 in my case), LV path (/dev/vgdata/lvdata in my case)
Note: While adding resources, keep "child resource" option in perspective. It defines order and precedence of resources added.
At this point your can submit the configuration and try starting service on either nodes. You should have VIP and /u02 available. you can check service fail-over as well.
v- Install oracle on both nodes (on /u01 which is a local non-shared mount point). At this point,
probably, you need to handover servers to some oracle guy to go through the OS configuration
related to Oracle and Oracle installation.
vi- Add Oracle resource. provide resource name and other oracle details like ORACLE_HOME and
ORACLE_SID etc.
vii- Add Oracle listener resource.
viii- At this point you should be able to run oracle instance on both nodes using luci GUI. If service is
failed, refer to the cluster log files and Linux messages to know whats actually going on. Use
clustat to verify the status of cluster and the service.