Clusterware 11gR2 – Setting up an Active/Passive failover configuration
- by Gilles Haro
Oracle is providing a large range of interesting solutions to ensure High Availability of the database.    Dataguard, RAC or even both configurations (as recommended by Oracle for a Maximum Available Architecture - MAA) are the most frequently found and used solutions.     But, when it comes to protecting a system with an Active/Passive architecture with failover capabilities, people often thinks to other expensive third party cluster systems.   Oracle Clusterware technology, which comes along at no extra-cost with Oracle Database or Oracle Unbreakable Linux, is - in the knowing of most people - often linked to Oracle RAC and therefore, is seldom used to implement failover solutions.    Oracle Clusterware 11gR2  (a part of Oracle 11gR2 Grid Infrastructure)  provides a comprehensive framework to setup automatic failover configurations.     It is actually possible to make "failover-able'", and then to protect, almost any kind of application (from the simple xclock to the most complex Application Server).  Quoting Oracle:    “Oracle Clusterware is a portable cluster software that allows clustering of single servers so that they cooperate as a single system. Oracle Clusterware also provides the required infrastructure for Oracle Real Application Clusters (RAC). In addition Oracle Clusterware enables the protection of any Oracle application or any other kind of application within a cluster.”  In the next couple of lines, I will try to present the different steps to achieve this goal : Have a fully operational 11gR2 database protected by automatic failover capabilities.  I assume you are fluent in installing Oracle Database 11gR2, Oracle Grid Infrastructure 11gR2 on a Linux system and that ASM is not a problem for you (as I am using it as a shared storage).    If not, please have a look at Oracle Documentation.  As often, I made my tests using an Oracle VirtualBox environment. The scripts are tested and functional on my system. Unfortunately, there can always be a typo or a mistake.   This blog entry does not replace a course around the Clusterware Framework.    I just hope it will let you see how powerful it is and that it will give you the whilst to go further with it...   Note : This entry has been revised (rev.2) following comments from Philip Newlan.   Prerequisite     2 Linux boxes (OELCluster01 and OELCluster02) at the same OS level. I used OEL 5 Update 5 with an Enterprise Kernel.     Shared Storage (SAN). On my VirtualBox system, I used Openfiler to simulate the SAN     Oracle 11gR2 Database (11.2.0.1)     Oracle 11gR2 Grid Infrastructure (11.2.0.1)       Step 1 - Install the software     Using asmlib, create 3 ASM disks (ASM_CRS, ASM_DTA and ASM_FRA)     Install Grid Infrastructure for a cluster (OELCluster01 and OELCluster02 are the 2 nodes of the cluster)              Use ASM_CRS to store Voting Disk and OCR.         Use SCAN.             Install Oracle Database Standalone binaries on both nodes.              Use asmca to check/mount the disk groups on 2 nodes         Use dbca to create and configure a database on the primary node                      Let's name it DB11G.             Copy the pfile, password file to the second node.             Create adump directoty on the second node.                               Step 2 - Setup the resource to be protected  After its creation with dbca, the database is automatically protected by the Oracle Restart technology available with Grid Infrastructure. Consequently, it restarts automatically (if possible) after a crash (ex: kill -9 smon).     A database resource has been created for that in the Cluster Registry.     We can observe this with the command : crsctl status resource that shows and ora.dba11g.db entry.  Let's save the definition of this resource, for future use :      mkdir -p /crs/11.2.0/HA_scripts     chown oracle:oinstall /crs/11.2.0/HA_scripts     crsctl status resource ora.db11g.db -p > /crs/11.2.0/HA_scripts/myResource.txt    Although very interesting, Oracle Restart is not cluster aware and cannot restart the database on any other node of the cluster. So, let's remove it from the OCR definitions, we don't need it !     srvctl stop database -d DB11G     srvctl remove database -d DB11G    Instead of it, we need to create a new resource of a more general type : cluster_resource. Here are the steps to achieve this :     Create an action script :  /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh       #!/bin/bash    export ORACLE_HOME=/oracle/product/11.2.0/dbhome_1          export ORACLE_SID=DB11G    case $1 in        'start')          $ORACLE_HOME/bin/sqlplus /nolog <<EOF          connect / as sysdba          startup        EOF          RET=0          ;;        'stop')          $ORACLE_HOME/bin/sqlplus /nolog <<EOF          connect / as sysdba          shutdown immediate        EOF          RET=0          ;;     'clean')          $ORACLE_HOME/bin/sqlplus /nolog <<EOF          connect / as sysdba          shutdown abort    ##for i in `ps -ef | grep -i $ORACLE_SID | awk '{print $2}' ` ;do kill -9 $i; done       EOF          RET=0          ;;            'check')           ok=`ps -ef | grep smon | grep $ORACLE_SID | wc -l`           if [ $ok = 0 ]; then             RET=1           else             RET=0           fi           ;;        '*')      RET=0          ;;        esac    if [ $RET -eq 0 ]; then           exit 0        else           exit 1        fi      This script must provide, at least, methods to start, stop, clean and check the database. It is self-explaining and contains nothing special. Just be aware that it must be runnable (+x), it runs as Oracle user (because of the ACL property - see later) and needs to know about the environment. Also make sure it exists on every node of the cluster. Moreover, as of 11.2, the clean method is mandatory. It must provide the “last gasp clean up”, for example, a shutdown abort or a kill –9 of all the remaining processes.     chmod +x /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh     scp  /crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh   oracle@OELCluster02:/crs/11.2.0/HA_scripts    Create a new resource file, based on the information we got from previous  myResource.txt .     Name it myNewResource.txt.      myResource.txt  is shown below.     As we can see, it defines an ora.database.type resource, named ora.db11g.db.     A lot of properties are related to this type of resource and do not need to be used for a cluster_resource.     NAME=ora.db11g.db        TYPE=ora.database.type        ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--        ACTION_FAILURE_TEMPLATE=        ACTION_SCRIPT=        ACTIVE_PLACEMENT=1        AGENT_FILENAME=%CRS_HOME%/bin/oraagent%CRS_EXE_SUFFIX%        AUTO_START=restore        CARDINALITY=1        CHECK_INTERVAL=1        CHECK_TIMEOUT=600        CLUSTER_DATABASE=false        DB_UNIQUE_NAME=DB11G        DEFAULT_TEMPLATE=PROPERTY(RESOURCE_CLASS=database) PROPERTY(DB_UNIQUE_NAME= CONCAT(PARSE(%NAME%, ., 2), %USR_ORA_DOMAIN%, .)) ELEMENT(INSTANCE_NAME= %GEN_USR_ORA_INST_NAME%)        DEGREE=1        DESCRIPTION=Oracle Database resource        ENABLED=1        FAILOVER_DELAY=0        FAILURE_INTERVAL=60        FAILURE_THRESHOLD=1        GEN_AUDIT_FILE_DEST=/oracle/admin/DB11G/adump        GEN_USR_ORA_INST_NAME=        GEN_USR_ORA_INST_NAME@SERVERNAME(oelcluster01)=DB11G        HOSTING_MEMBERS=        INSTANCE_FAILOVER=0        LOAD=1        LOGGING_LEVEL=1        MANAGEMENT_POLICY=AUTOMATIC        NLS_LANG=        NOT_RESTARTING_TEMPLATE=        OFFLINE_CHECK_INTERVAL=0        ORACLE_HOME=/oracle/product/11.2.0/dbhome_1        PLACEMENT=restricted        PROFILE_CHANGE_TEMPLATE=        RESTART_ATTEMPTS=2        ROLE=PRIMARY        SCRIPT_TIMEOUT=60        SERVER_POOLS=ora.DB11G        SPFILE=+DTA/DB11G/spfileDB11G.ora        START_DEPENDENCIES=hard(ora.DTA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DTA.dg,ora.FRA.dg)        START_TIMEOUT=600        STATE_CHANGE_TEMPLATE=        STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DTA.dg,shutdown:ora.FRA.dg)        STOP_TIMEOUT=600        UPTIME_THRESHOLD=1h        USR_ORA_DB_NAME=DB11G        USR_ORA_DOMAIN=haroland        USR_ORA_ENV=        USR_ORA_FLAGS=        USR_ORA_INST_NAME=DB11G        USR_ORA_OPEN_MODE=open        USR_ORA_OPI=false        USR_ORA_STOP_MODE=immediate          VERSION=11.2.0.1.0      I removed database type related entries from myResource.txt and modified some other to produce the following myNewResource.txt.     Notice the NAME property that should not have the ora. prefix     Notice the TYPE property that is not ora.database.type but cluster_resource.     Notice the definition of ACTION_SCRIPT.     Notice the HOSTING_MEMBERS that enumerates the members of the cluster (as returned by the olsnodes command).               NAME=DB11G.db        TYPE=cluster_resource        DESCRIPTION=Oracle Database resource        ACL=owner:oracle:rwx,pgrp:oinstall:rwx,other::r--        ACTION_SCRIPT=/crs/11.2.0/HA_scripts/my_ActivePassive_Cluster.sh        PLACEMENT=restricted        ACTIVE_PLACEMENT=0        AUTO_START=restore        CARDINALITY=1        CHECK_INTERVAL=10          DEGREE=1        ENABLED=1        HOSTING_MEMBERS=oelcluster01 oelcluster02        LOGGING_LEVEL=1        RESTART_ATTEMPTS=1          START_DEPENDENCIES=hard(ora.DTA.dg,ora.FRA.dg) weak(type:ora.listener.type,uniform:ora.ons,uniform:ora.eons) pullup(ora.DTA.dg,ora.FRA.dg)        START_TIMEOUT=600        STOP_DEPENDENCIES=hard(intermediate:ora.asm,shutdown:ora.DTA.dg,shutdown:ora.FRA.dg)        STOP_TIMEOUT=600        UPTIME_THRESHOLD=1h   Register the resource.    Take care of the resource type. It needs to be a cluster_resource and not a ora.database.type resource (Oracle recommendation) .       crsctl add resource DB11G.db  -type cluster_resource -file /crs/11.2.0/HA_scripts/myNewResource.txt           Step 3 - Start the resource     crsctl start resource DB11G.db    This command launches the ACTION_SCRIPT with a start and a check parameter on the primary node of the cluster.    Step 4 - Test this  We will test the setup using 2 methods.     crsctl relocate resource DB11G.db    This command calls the ACTION_SCRIPT  (on the two nodes)  to stop the database on the active node and start it on the other node.    Once done, we can revert back to the original node, but, this time we can use a more "MS$ like" method :Turn off the server on which the database is running.     After short delay, you should observe that the database is relocated on node 1.       Conclusion  Once the software installed and the standalone database created (which is a rather common and usual task), the steps to reach the objective are quite easy :          Create an executable action script on every node of the cluster.            Create a resource file.            Create/Register the resource with OCR      using the resource file.            Start the resource.      This solution is a very interesting alternative to licensable third party solutions.  References     Clusterware 11gR2 documentation     Oracle Clusterware Resource Reference     Clusterware for Unbreakable Linux     Using Oracle Clusterware to Protect A Single Instance Oracle Database 11gR1 (to have an idea of complexity)     Oracle Clusterware on OTN       Gilles Haro          Technical Expert - Core Technology, Oracle Consulting