
We’re now ready to create an Oracle RAC database.
It’s mostly quite similar to creating a single instance Oracle database.
The major differences come with how the various infrastructure components are administered.
We’ll get to that in good time.
Quick links to all the tasks:
- Task #1: Create Additional ASM Diskgroups.
- Task #2: Create an Oracle 12c RAC Database.
- Task #3: Test Oracle 12c GI and RAC Database.
- Task #4: Common Administration Tasks.
Task #1: Create Additional ASM Diskgroups.
So far we have the +CRS ASM Diskgroup which provides storage for the Votedisk, the OCR and the GIMR database (_MGMTDB). Our own RAC database will also use ASM storage, so we’ll need to create a couple more ASM Diskgroups before we create the database.
On racnode1, login as the grid user and fire up the ASM Configuration Assistant (asmca):
[grid@racnode1 ~]$ . oraenv ORACLE_SID = [grid] ? +ASM1 The Oracle base has been set to /u01/app/grid
Set your DISPLAY environment variable so it points back to your workstation:
[grid@racnode1 ~]$ export DISPLAY=<your-workstation-or-IP>:0.0 [grid@racnode1 ~]$ asmca

Click on Create. Enter DATA for the Disk Group Name, select External for the Redundancy level and select ASMDISK02 and ASMDISK03. Then Click OK:

Click OK:

Click Create:

Enter FRA for the Disk Group Name, select External for the Redundancy level and select ASMDISK04 and ASMDISK05. Then click OK:

Click OK in the Disk Group:Creation pop-up window:

Then click Exit to close the ASM Configuration Assistant.
Et voila! Two shiny new ASM Diskgroups ready for our RAC database to use. It will be pleased.
Task #2: Create an Oracle 12c RAC Database.
A quick way to set your oracle environment correctly when no databases currently exist on the server is to edit the /etc/oratab file to include a dummy line:
[oracle@racnode1 ~]$ vi /etc/oratab dummy:/u01/app/oracle/product/12.1.0/dbhome_1:N
Use the standard oraenv script to set your environment, then unset the ORACLE_SID:
[oracle@racnode1 ~]$ . oraenv ORACLE_SID = [oracle] ? dummy The Oracle base has been set to /u01/app/oracle [oracle@racnode1 ~]$ unset ORACLE_SID
You can tell you’re pointing to the correct set of Oracle executables by using the which command:
[oracle@racnode1 database]$ which dbca /u01/app/oracle/product/12.1.0/dbhome_1/bin/dbca
Finally, set your DISPLAY variable correctly and fire up the Database Configuration Assistant (dbca):
[oracle@racnode1 ~]$ export DISPLAY=<your-workstation-or-IP>:0.0 [oracle@racnode1 ~]$ dbca
Select Create Database, then click Next:

Select Advanced Mode, then click Next:

Use these values to complete the next screen, then click Next:
Field | Value |
---|---|
Database Type | Oracle Real Application Clusters (RAC) database |
Configuration Type | Admin-Managed |
Template | General Purpose or Transaction Processing |

Enter RAC1DB.mynet.com for the Global Database Name and uncheck Create As Container Database. Then Click Next:

Ensure both racnode1 and racnode2 are in the Selected column, then click Next:

Use these values to complete the next screen, then click Next:
Field | Value |
---|---|
Run Cluster Verification Utility (CVU) Checks Periodically | Checked |
Configure Enterprise Manager (EM) Database Express | Checked |
EM Database Express Port | 5500 |
Register with Enterprise Manager (EM) Cloud Control | Checked |
OMS Host | oraemcc.mynet.com |
OMS Port | 4903 |
EM Admin Username | SYSMAN |
EM Admin Password | (referenced here) |

Select Use the Same Administrative Password for All Accounts. Enter the SYSDBA password twice then click Next:

Complete the next screen using the values shown in the screen shot, then click Next:

Create a password for the ASMSNMP ASM instance account, then click OK:

Just click Next:

On the Memory tab, select Use Automatic Memory Management and enter 2048 MB for the Memory Size, then click the Character Sets tab:

On the Character Sets tab, select Use Unicode (AL32UTF8) then click Next:

Ensure Create Database is checked, then click Next:

The Prerequisite Checks screen highlights the swap size failure which we can safely ignore. Click Next:

On the Summary screen, click Finish:

The familiar progress bar tells you how things are going:

Eventually you’ll see the Finish screen. Click Close and you’re done.

You now have an Oracle 12c RAC database up and running with 2 instances.
In the next section we’ll run through a series of tests to find out if all is well.
Task #3: Test Oracle 12c GI and RAC Database.
After you build out your cluster and RAC database, it’s important that you test everything is working properly, especially when it will become a production system. Testing really falls into two categories, functionality testing and workload testing. Functionality testing focuses on finding out if the configuration behaves in the way it should. Workload testing focuses on finding out if the configuration can handle the expected workload in terms of response time and throughput.
You can view an outline test plan here. It’s a decent start at putting together a comprehensive test plan, but it does omit three key areas. It does not contain any workload testing, network failure testing or any destructive testing. Let’s get testing and see what happens!
Section #1: Check & Test Clusterware.
The first set of checks and tests focuses on Clusterware. It gets us comfortable with using the various Clusterware administration commands and helps with identifying where certain resources are located.
Test #1B. Check: Resource TARGET and STATE columns.
The crs_stat command is supposed to be deprecated, but still works in 12c.
What we’re looking for here is any line where the Target and State column values are different. The Target value is what the status should be and the State value is what the status actually is. So, a Target value of ONLINE and a State value of OFFLINE could mean one of two things. Either the resource is coming online, but hadn’t quite made it when you ran the command or that resource has a problem and cannot come online. In which case, it would be time to dive into the log files. As we can see from this output, all is well.
[grid@racnode1 ~]$ crs_stat -t Name Type Target State Host ------------------------------------------------------------ ora.CRS.dg ora....up.type ONLINE ONLINE racnode1 ora.DATA.dg ora....up.type ONLINE ONLINE racnode1 ora.FRA.dg ora....up.type ONLINE ONLINE racnode1 ora....ER.lsnr ora....er.type ONLINE ONLINE racnode1 ora....N1.lsnr ora....er.type ONLINE ONLINE racnode2 ora....N2.lsnr ora....er.type ONLINE ONLINE racnode1 ora....N3.lsnr ora....er.type ONLINE ONLINE racnode1 ora.MGMTLSNR ora....nr.type ONLINE ONLINE racnode1 ora.asm ora.asm.type ONLINE ONLINE racnode1 ora.cvu ora.cvu.type ONLINE ONLINE racnode1 ora.mgmtdb ora....db.type ONLINE ONLINE racnode1 ora....network ora....rk.type ONLINE ONLINE racnode1 ora.oc4j ora.oc4j.type ONLINE ONLINE racnode1 ora.ons ora.ons.type ONLINE ONLINE racnode1 ora.rac1db.db ora....se.type ONLINE ONLINE racnode1 ora....SM1.asm application ONLINE ONLINE racnode1 ora....E1.lsnr application ONLINE ONLINE racnode1 ora....de1.ons application ONLINE ONLINE racnode1 ora....de1.vip ora....t1.type ONLINE ONLINE racnode1 ora....SM2.asm application ONLINE ONLINE racnode2 ora....E2.lsnr application ONLINE ONLINE racnode2 ora....de2.ons application ONLINE ONLINE racnode2 ora....de2.vip ora....t1.type ONLINE ONLINE racnode2 ora.scan1.vip ora....ip.type ONLINE ONLINE racnode2 ora.scan2.vip ora....ip.type ONLINE ONLINE racnode1 ora.scan3.vip ora....ip.type ONLINE ONLINE racnode1
This is the preferred CRS resource status check command:
[grid@racnode1 ~]$ crsctl status resource -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRS.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.DATA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.FRA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.LISTENER.lsnr ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.asm ONLINE ONLINE racnode1 Started,STABLE ONLINE ONLINE racnode2 Started,STABLE ora.net1.network ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.ons ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE racnode1 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE racnode1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE racnode1 169.254.182.6 200.20 0.30.11,STABLE ora.cvu 1 ONLINE ONLINE racnode1 STABLE ora.mgmtdb 1 ONLINE ONLINE racnode1 Open,STABLE ora.oc4j 1 ONLINE ONLINE racnode1 STABLE ora.rac1db.db 1 ONLINE ONLINE racnode1 Open,STABLE 2 ONLINE ONLINE racnode2 Open,STABLE ora.racnode1.vip 1 ONLINE ONLINE racnode1 STABLE ora.racnode2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan1.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan2.vip 1 ONLINE ONLINE racnode1 STABLE ora.scan3.vip 1 ONLINE ONLINE racnode1 STABLE --------------------------------------------------------------------------------
Test #1C. Check: Cluster Verification.
The 3 main services are online on all nodes. Good news.
[grid@racnode1 ~]$ crsctl check cluster -all ************************************************************** racnode1: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online ************************************************************** racnode2: CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online **************************************************************
Test #1D. Check: CRS Verification.
Similar to Test 1C, this output shows the HA services are all OK too.
[grid@racnode1 ~]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online [grid@racnode2 ~]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
Test #1E. Check: CTSS.
Remember we’re not using NTP for server time synchronization, so Oracle’s CTSS should be doing the work for us which it seems to be.
[grid@racnode1 ~]$ crsctl check ctss CRS-4701: The Cluster Time Synchronization Service is in Active mode. CRS-4702: Offset (in msec): 0 [grid@racnode2 ~]$ crsctl check ctss CRS-4701: The Cluster Time Synchronization Service is in Active mode. CRS-4702: Offset (in msec): 0
Test #1F. Check: DNS.
Remember we configured DNS on the OEM Cloud Control server (oraemcc.mynet.com, 200.200.10.16). It seems to be working for both RAC nodes.
[grid@racnode1 ~]$ crsctl query dns -servers CRS-10018: the following configuration was found on the system: CRS-10019: There are 1 domains in search order. They are: mynet.com CRS-10022: There are 1 name servers. They are: 200.200.10.16 CRS-10020: number of retry attempts for name lookup is: 4 CRS-10021: timeout for each name lookup is: 5 [grid@racnode2 ~]$ crsctl query dns -servers CRS-10018: the following configuration was found on the system: CRS-10019: There are 1 domains in search order. They are: mynet.com CRS-10022: There are 1 name servers. They are: 200.200.10.16 CRS-10020: number of retry attempts for name lookup is: 4 CRS-10021: timeout for each name lookup is: 5
Test #1G. Check: Votedisk(s).
The Votedisk seems fine, but best practice says we should have more than just one and an odd number of them. We might want to fix this later.
[grid@racnode1 ~]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 6623c2c7b3a24fc5bffd28b93c053000 (/dev/oracleasm/disks/ASMDISK01) [CRS] Located 1 voting disk(s).
Test #1H. Check: OCR File(s).
We can have up to 5 OCR files and we currently only have one. We might want to fix that later. Also, to perform the logical corruption check, we need to run this ocrcheck command as root.
[grid@racnode1 ~]$ cat /etc/oracle/ocr.loc #Device/file +CRS getting replaced by device +CRS/cluster1/OCRFILE/registry.255.898728137 ocrconfig_loc=+CRS/cluster1/OCRFILE/registry.255.898728137 [grid@racnode1 ~]$ ocrcheck -details Status of Oracle Cluster Registry is as follows : Version : 4 Total space (kbytes) : 409568 Used space (kbytes) : 1588 Available space (kbytes) : 407980 ID : 1641944593 Device/File Name : +CRS/cluster1/OCRFILE/registry.255.898728137 Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user
[root@racnode1 ~]# /u01/app/12.1.0/grid/bin/ocrcheck -details Status of Oracle Cluster Registry is as follows : Version : 4 Total space (kbytes) : 409568 Used space (kbytes) : 1588 Available space (kbytes) : 407980 ID : 1641944593 Device/File Name : +CRS/cluster1/OCRFILE/registry.255.898728137 Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check succeeded
Test #1I. Check: OCR Backups.
The OCR is backed up automatically every 4 hours. If the cluster has been up and running long enough you can see the backup files. Running this command on any node should produce the same output.
[grid@racnode1 ~]$ ocrconfig -showbackup racnode1 2016/01/27 10:15:35 /u01/app/12.1.0/grid/cdata/cluster1/backup00.ocr 0 racnode1 2016/01/27 06:14:48 /u01/app/12.1.0/grid/cdata/cluster1/backup01.ocr 0 racnode1 2016/01/27 02:14:11 /u01/app/12.1.0/grid/cdata/cluster1/backup02.ocr 0 racnode1 2016/01/26 18:13:01 /u01/app/12.1.0/grid/cdata/cluster1/day.ocr 0 racnode1 2016/01/15 04:31:01 /u01/app/12.1.0/grid/cdata/cluster1/week.ocr 0 PROT-25: Manual backups for the Oracle Cluster Registry are not available
Test #1J. Check: OLR.
The OLR plays in important role in starting the cluster if the OCR is stored in ASM. Better make sure each node’s OLR is alive and well.
[grid@racnode1 ~]$ cat /etc/oracle/olr.loc olrconfig_loc=/u01/app/12.1.0/grid/cdata/racnode1.olr crs_home=/u01/app/12.1.0/grid [grid@racnode1 ~]$ ocrcheck -local -config Oracle Local Registry configuration is : Device/File Name : /u01/app/12.1.0/grid/cdata/racnode1.olr [grid@racnode1 ~]$ ls -l /u01/app/12.1.0/grid/cdata/racnode1.olr -rw------- 1 root oinstall 503484416 Feb 6 14:22 /u01/app/12.1.0/grid/cdata/racnode1.olr [grid@racnode2 ~]$ cat /etc/oracle/olr.loc olrconfig_loc=/u01/app/12.1.0/grid/cdata/racnode2.olr crs_home=/u01/app/12.1.0/grid [grid@racnode2 ~]$ ocrcheck -local -config Oracle Local Registry configuration is : Device/File Name : /u01/app/12.1.0/grid/cdata/racnode2.olr [grid@racnode2 ~]$ ls -l /u01/app/12.1.0/grid/cdata/racnode2.olr -rw------- 1 root oinstall 503484416 Feb 6 18:24 /u01/app/12.1.0/grid/cdata/racnode2.olr
Test #1K. Check: GPnP.
The Grid Plug ‘n’ Play profile is an XML file which also plays an important role in cluster startup. It contains important node configuration information including the storage location of some GI resources, the node’s networking configuration and so on. Make sure it’s where it ought to be and eye ball its contents to make sure they appear accurate. They should be as they’re maintained by the GPnP daemon (gpnpd.bin).
[grid@racnode1 ~]$ cd /u01/app/12.1.0/grid/gpnp/racnode1/profiles/peer [grid@racnode1 peer]$ ls -l -rw-r--r-- 1 grid oinstall 1908 Dec 17 22:44 profile.old -rw-r--r-- 1 grid oinstall 1850 Dec 17 22:36 profile_orig.xml -rw-r--r-- 1 grid oinstall 1908 Dec 17 22:44 profile.xml [grid@racnode2 ~]$ cd /u01/app/12.1.0/grid/gpnp/racnode2/profiles/peer [grid@racnode2 peer]$ ls -l -rw-r--r-- 1 grid oinstall 1850 Dec 17 23:05 profile_orig.xml -rw-r--r-- 1 grid oinstall 1908 Dec 17 23:05 profile.xml
Test #1L. Check: SCAN VIPs.
The SCAN Listeners each need a VIP on which to operate. These commands show use they’re up and running and open for business.
[grid@racnode1 ~]$ srvctl status scan SCAN VIP scan1 is enabled SCAN VIP scan1 is running on node racnode2 SCAN VIP scan2 is enabled SCAN VIP scan2 is running on node racnode2 SCAN VIP scan3 is enabled SCAN VIP scan3 is running on node racnode1 [grid@racnode1 ~]$ srvctl config scan SCAN name: cluster1-scan.mynet.com, Network: 1 Subnet IPv4: 200.200.10.0/255.255.255.0/eth0, static Subnet IPv6: SCAN 0 IPv4 VIP: 200.200.10.122 SCAN VIP is enabled. SCAN VIP is individually enabled on nodes: SCAN VIP is individually disabled on nodes: SCAN 1 IPv4 VIP: 200.200.10.120 SCAN VIP is enabled. SCAN VIP is individually enabled on nodes: SCAN VIP is individually disabled on nodes: SCAN 2 IPv4 VIP: 200.200.10.121 SCAN VIP is enabled. SCAN VIP is individually enabled on nodes: SCAN VIP is individually disabled on nodes: [grid@racnode1 ~]$ ps -ef | grep -v grep | grep SCAN grid 7739 1 0 Jan26 ? 00:00:41 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN3 -no_crs_notify -inherit [grid@racnode2 ~]$ ps -ef | grep -v grep | grep SCAN grid 16190 1 0 Jan26 ? 00:00:43 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN1 -no_crs_notify -inherit grid 7736 1 0 Jan26 ? 00:00:41 /u01/app/12.1.0/grid/bin/tnslsnr LISTENER_SCAN2 -no_crs_notify -inherit [grid@racnode1 ~]$ ping 200.200.10.120 PING 200.200.10.120 (200.200.10.120) 56(84) bytes of data. 64 bytes from 192.168.0.120: icmp_seq=1 ttl=64 time=0.288 ms 64 bytes from 192.168.0.120: icmp_seq=2 ttl=64 time=0.152 ms --- 192.168.0.120 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1362ms rtt min/avg/max/mdev = 0.152/0.220/0.288/0.068 ms [grid@racnode1 ~]$ ping 200.200.10.121 PING 200.200.10.121 (200.200.10.121) 56(84) bytes of data. 64 bytes from 192.168.0.121: icmp_seq=1 ttl=64 time=0.072 ms 64 bytes from 192.168.0.121: icmp_seq=2 ttl=64 time=0.028 ms --- 192.168.0.121 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1311ms rtt min/avg/max/mdev = 0.028/0.050/0.072/0.022 ms [grid@racnode1 ~]$ ping 200.200.10.122 PING 200.200.10.122 (200.200.10.122) 56(84) bytes of data. 64 bytes from 192.168.0.122: icmp_seq=1 ttl=64 time=0.310 ms 64 bytes from 192.168.0.122: icmp_seq=2 ttl=64 time=0.186 ms --- 192.168.0.122 ping statistics --- 2 packets transmitted, 2 received, 0% packet loss, time 1314ms rtt min/avg/max/mdev = 0.186/0.248/0.310/0.062 ms
Test #1M. Check: Node VIPs.
To avoid a lengthy TCP timeout in the event of a node failure, its VIP needs to failover to a surviving node. For it to do that, it needs to be enabled and pingable.
[grid@racnode1 ~]$ srvctl status vip -node racnode1 VIP racnode1-vip.mynet.com is enabled VIP racnode1-vip.mynet.com is running on node: racnode1 [grid@racnode1 ~]$ srvctl status vip -node racnode2 VIP racnode2-vip.mynet.com is enabled VIP racnode2-vip.mynet.com is running on node: racnode2 [grid@racnode1 ~]$ ping racnode1-vip.mynet.com PING racnode1-vip.mynet.com (200.200.10.111) 56(84) bytes of data. 64 bytes from racnode1-vip.mynet.com (200.200.10.111): icmp_seq=1 ttl=64 time=0.024 ms 64 bytes from racnode1-vip.mynet.com (200.200.10.111): icmp_seq=2 ttl=64 time=0.031 ms 64 bytes from racnode1-vip.mynet.com (200.200.10.111): icmp_seq=3 ttl=64 time=0.026 ms --- racnode1-vip.mynet.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2441ms rtt min/avg/max/mdev = 0.024/0.027/0.031/0.003 ms [grid@racnode1 ~]$ ping racnode2-vip.mynet.com PING racnode2-vip.mynet.com (200.200.10.112) 56(84) bytes of data. 64 bytes from racnode2-vip.mynet.com (200.200.10.112): icmp_seq=1 ttl=64 time=0.308 ms 64 bytes from racnode2-vip.mynet.com (200.200.10.112): icmp_seq=2 ttl=64 time=0.174 ms 64 bytes from racnode2-vip.mynet.com (200.200.10.112): icmp_seq=3 ttl=64 time=0.163 ms --- racnode2-vip.mynet.com ping statistics --- 3 packets transmitted, 3 received, 0% packet loss, time 2513ms rtt min/avg/max/mdev = 0.163/0.215/0.308/0.065 ms
Test #1N. Check: Nodeapps.
Nodeapps comprise a node’s VIP, network and ONS daemon. Check to see if they’re all up and running.
[grid@racnode1 ~]$ srvctl status nodeapps VIP racnode1-vip.mynet.com is enabled VIP racnode1-vip.mynet.com is running on node: racnode1 VIP racnode2-vip.mynet.com is enabled VIP racnode2-vip.mynet.com is running on node: racnode2 Network is enabled Network is running on node: racnode1 Network is running on node: racnode2 ONS is enabled ONS daemon is running on node: racnode1 ONS daemon is running on node: racnode2
Test #1O. Check: Node Participation.
You might think you know which nodes make up your cluster, but does the cluster agree? In our case, yes it does!
[grid@racnode1 ~]$ olsnodes -n -i -s -t -a racnode1 1 racnode1-vip.mynet.com Active Hub Unpinned racnode2 2 racnode2-vip.mynet.com Active Hub Unpinned
Section #2: Check & Test ASM.
ASM has become the de facto storage standard for Oracle databases, so we’d better check the clustered version is healthy.
Test #2B. Check: ASM Instances.
Ensure the ASM instances are up, running and can accept logins.
[grid@racnode1 ~]$ srvctl status asm ASM is running on racnode2,racnode1 [grid@racnode1 ~]$ sqlplus / as sysasm SQL*Plus: Release 12.1.0.2.0 Production on Sat Feb 6 17:14:59 2016 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected to: Oracle Database 12c Enterprise Edition Release 12.1.0.2.0 - 64bit Production With the Real Application Clusters and Automatic Storage Management options SQL> [grid@racnode1 ~]$ asmcmd ASMCMD> ls -l State Type Rebal Name MOUNTED EXTERN N CRS/ MOUNTED EXTERN N DATA/ MOUNTED EXTERN N FRA/
Test #2C. Check: ASM Diskgroups.
Next, check that all the ASM Diskgroups are OK.
[grid@racnode1 ~]$ echo "select name from v\$asm_diskgroup;" | sqlplus -s / as sysasm NAME ------------------------------ FRA DATA CRS [grid@racnode1 ~]$ for dg in FRA DATA CRS > do > srvctl status diskgroup -diskgroup ${dg} -detail > done Disk Group FRA is running on racnode2,racnode1 Disk Group FRA is enabled Disk Group DATA is running on racnode2,racnode1 Disk Group DATA is enabled Disk Group CRS is running on racnode2,racnode1 Disk Group CRS is enabled
Test #2D. Check: ASM Diskgroup Metadata.
Check the ASM Diskgroup metadata is present and correct. No error messages means it is.
[grid@racnode1 ~]$ sqlplus / as sysasm SQL> alter diskgroup crs check all; Diskgroup altered. SQL> alter diskgroup data check all; Diskgroup altered. [grid@racnode1 ~]$ asmcmd ASMCMD> chkdg FRA Diskgroup altered.
Test #2E. Check: ASM Disks.
Ensure all the ASM Disks you have configured are visible.
[grid@racnode1 ~]$ sqlplus / as sysasm SQL> select dg.name DG, d.name Disk, decode(d.GROUP_NUMBER, 0,'Unallocated', 'In Use') State, d.path from v$asm_disk d left outer join v$asm_diskgroup dg on dg.group_number = d.group_number order by dg.name, d.path; DG DISK STATE PATH ---------- ---------- ----------- ------------------------------ CRS CRS_0000 In Use /dev/oracleasm/disks/ASMDISK01 DATA DATA_0000 In Use /dev/oracleasm/disks/ASMDISK02 DATA DATA_0001 In Use /dev/oracleasm/disks/ASMDISK03 DATA DATA_0002 In Use /dev/oracleasm/disks/ASMDISK06 FRA FRA_0000 In Use /dev/oracleasm/disks/ASMDISK04 FRA FRA_0001 In Use /dev/oracleasm/disks/ASMDISK05 Unallocated /dev/oracleasm/disks/ASMDISK07 Unallocated /dev/oracleasm/disks/ASMDISK08 Unallocated /dev/oracleasm/disks/ASMDISK09 Unallocated /dev/oracleasm/disks/ASMDISK10 ASMCMD> lsdsk --discovery Path /dev/oracleasm/disks/ASMDISK01 /dev/oracleasm/disks/ASMDISK02 /dev/oracleasm/disks/ASMDISK03 /dev/oracleasm/disks/ASMDISK04 /dev/oracleasm/disks/ASMDISK05 /dev/oracleasm/disks/ASMDISK06 /dev/oracleasm/disks/ASMDISK07 /dev/oracleasm/disks/ASMDISK08 /dev/oracleasm/disks/ASMDISK09 /dev/oracleasm/disks/ASMDISK10
Test #2F. Check: ASM Clients.
Verify what is connected to each ASM instance.
[grid@racnode1 ~]$ sqlplus / as sysasm SQL> select dg.NAME, c.INSTANCE_NAME, c.DB_NAME, c.CLUSTER_NAME, c.STATUS from v$asm_client c, v$asm_diskgroup dg where c.GROUP_NUMBER = dg.GROUP_NUMBER order by c.DB_NAME; NAME INSTANCE_N DB_NAME CLUSTER_NAME STATUS ---------- ---------- -------- ------------ ------------ DATA +ASM1 +ASM cluster1 CONNECTED CRS +ASM1 +ASM cluster1 CONNECTED FRA RAC1DB1 RAC1DB cluster1 CONNECTED DATA RAC1DB1 RAC1DB cluster1 CONNECTED CRS -MGMTDB _mgmtdb cluster1 CONNECTED
[grid@racnode2 ~]$ sqlplus / as sysasm NAME INSTANCE_N DB_NAME CLUSTER_NAME STATUS ---------- ---------- -------- ------------ ------------ CRS +ASM2 +ASM cluster1 CONNECTED DATA +ASM2 +ASM cluster1 CONNECTED DATA RAC1DB2 RAC1DB cluster1 CONNECTED FRA RAC1DB2 RAC1DB cluster1 CONNECTED
Section #3: Check & Test Databases & Instances.
Test #3C. Check: User RAC Database & Instance.
Confirm where the database instances are running.
[oracle@racnode1 ~]$ srvctl status database -d RAC1DB Instance RAC1DB1 is running on node racnode1 Instance RAC1DB2 is running on node racnode2
Test #3D. Check: User RAC Database DBVERIFY.
For the sake of brevity, we’ll run the DBVERIFY executable (dbv) for only one datafile.
[oracle@racnode1 ~]$ sqlplus / as sysdba SQL> select name from v$datafile; NAME -------------------------------------------------------------------------------- +DATA/RAC1DB/DATAFILE/system.258.898779945 +DATA/RAC1DB/DATAFILE/sysaux.257.898779761 +DATA/RAC1DB/DATAFILE/undotbs1.260.898780207 +DATA/RAC1DB/DATAFILE/undotbs2.265.898783783 +DATA/RAC1DB/DATAFILE/users.259.898780201 [oracle@racnode1 scripts]$ cat dbv_parfile.txt FILE='+DATA/RAC1DB/DATAFILE/users.259.898780201' LOGFILE='users_datafile.out' FEEDBACK=10 USERID=system/<removed> [oracle@racnode1 scripts]$ dbv parfile=dbv_parfile.txt DBVERIFY: Release 12.1.0.2.0 - Production on Mon Feb 15 11:54:52 2016 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved.
[oracle@racnode1 scripts]$ cat users_datafile.out DBVERIFY: Release 12.1.0.2.0 - Production on Mon Feb 15 11:54:52 2016 Copyright (c) 1982, 2014, Oracle and/or its affiliates. All rights reserved. DBVERIFY - Verification starting : FILE = +DATA/RAC1DB/DATAFILE/users.259.898780201 DBVERIFY - Verification complete Total Pages Examined : 640 Total Pages Processed (Data) : 30 Total Pages Failing (Data) : 0 Total Pages Processed (Index): 5 Total Pages Failing (Index): 0 Total Pages Processed (Other): 588 Total Pages Processed (Seg) : 0 Total Pages Failing (Seg) : 0 Total Pages Empty : 17 Total Pages Marked Corrupt : 0 Total Pages Influx : 0 Total Pages Encrypted : 0 Highest block SCN : 0 (0.0)
Test #3H. Test: Restart User RAC Database.
[oracle@racnode1 ~]$ srvctl stop database -d RAC1DB [grid@racnode2 ~]$ crsctl status resource -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRS.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.DATA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.FRA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.LISTENER.lsnr ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.asm ONLINE ONLINE racnode1 Started,STABLE ONLINE ONLINE racnode2 Started,STABLE ora.net1.network ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.ons ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE racnode1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE racnode1 169.254.182.6 200.20 0.30.11,STABLE ora.cvu 1 OFFLINE OFFLINE STABLE ora.mgmtdb 1 ONLINE ONLINE racnode1 Open,STABLE ora.oc4j 1 ONLINE ONLINE racnode1 STABLE ora.rac1db.db 1 OFFLINE OFFLINE Instance Shutdown,ST ABLE 2 OFFLINE OFFLINE Instance Shutdown,ST ABLE ora.racnode1.vip 1 ONLINE ONLINE racnode1 STABLE ora.racnode2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan1.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan3.vip 1 ONLINE ONLINE racnode1 STABLE -------------------------------------------------------------------------------- [oracle@racnode1 ~]$ srvctl start database -d RAC1DB [grid@racnode2 ~]$ crsctl status resource -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRS.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.DATA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.FRA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.LISTENER.lsnr ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.asm ONLINE ONLINE racnode1 Started,STABLE ONLINE ONLINE racnode2 Started,STABLE ora.net1.network ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.ons ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE racnode1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE racnode1 169.254.182.6 200.20 0.30.11,STABLE ora.cvu 1 OFFLINE OFFLINE STABLE ora.mgmtdb 1 ONLINE ONLINE racnode1 Open,STABLE ora.oc4j 1 ONLINE ONLINE racnode1 STABLE ora.rac1db.db 1 ONLINE ONLINE racnode1 Open,STABLE 2 ONLINE ONLINE racnode2 Open,STABLE ora.racnode1.vip 1 ONLINE ONLINE racnode1 STABLE ora.racnode2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan1.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan3.vip 1 ONLINE ONLINE racnode1 STABLE --------------------------------------------------------------------------------
Test #3I. Test: Planned User Instance Shutdown.
[oracle@racnode1 ~]$ srvctl stop instance -d RAC1DB -i RAC1DB2 [grid@racnode2 ~]$ crsctl status resource -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRS.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.DATA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.FRA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.LISTENER.lsnr ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.asm ONLINE ONLINE racnode1 Started,STABLE ONLINE ONLINE racnode2 Started,STABLE ora.net1.network ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.ons ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE racnode1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE racnode1 169.254.182.6 200.20 0.30.11,STABLE ora.cvu 1 OFFLINE OFFLINE STABLE ora.mgmtdb 1 ONLINE ONLINE racnode1 Open,STABLE ora.oc4j 1 ONLINE ONLINE racnode1 STABLE ora.rac1db.db 1 ONLINE ONLINE racnode1 Open,STABLE 2 OFFLINE OFFLINE Instance Shutdown,ST ABLE ora.racnode1.vip 1 ONLINE ONLINE racnode1 STABLE ora.racnode2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan1.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan3.vip 1 ONLINE ONLINE racnode1 STABLE --------------------------------------------------------------------------------
Test #3J. Test: User Session TAF.
[oracle@oraemcc ~]$ cat $ORACLE_HOME/network/admin/tnsnames.ora RAC1DB = (DESCRIPTION = (ADDRESS = (PROTOCOL = TCP)(HOST = cluster1-scan.mynet.com)(PORT = 1521)) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = RAC1DB.mynet.com) (FAILOVER_MODE = (TYPE=SESSION) (METHOD=BASIC) (RETRIES=10) (DELAY=10) ) ) )
[oracle@oraemcc ~]$ sqlplus system@rac1db SQL> select instance_name from v$instance; INSTANCE_NAME ---------------- RAC1DB1 [oracle@racnode1 ~]$ srvctl stop instance -d RAC1DB -i RAC1DB1 [grid@racnode2 ~]$ crsctl status resource -t ... ora.rac1db.db 1 OFFLINE OFFLINE Instance Shutdown,ST ABLE 2 ONLINE ONLINE racnode2 Open,STABLE ...
SQL> select instance_name from v$instance; select instance_name from v$instance * ERROR at line 1: ORA-25408: can not safely replay call SQL> / INSTANCE_NAME ---------------- RAC1DB2
Section #4: Test System & Cluster.
Test #4C. Test: Cluster Restart.
[root@racnode1 ~]# cd /u01/app/12.1.0/grid/bin [root@racnode1 ~]# ./crsctl stop cluster -all
The output from this command is quite verbose, but it can be viewed here.
[root@racnode1 bin]# ./crsctl check cluster -all ************************************************************** racnode1: CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager ************************************************************** racnode2: CRS-4535: Cannot communicate with Cluster Ready Services CRS-4530: Communications failure contacting Cluster Synchronization Services daemon CRS-4534: Cannot communicate with Event Manager ************************************************************** [grid@racnode2 ~]$ crsctl status resource -t CRS-4535: Cannot communicate with Cluster Ready Services CRS-4000: Command Status failed, or completed with errors. [root@racnode1 bin]# ./crsctl start cluster -all
The output from the start cluster command is also a little verbose. It can be viewed here.
Note, despite the cluster having been shutdown and re-started, the instance RAC1DB1 on racnode1 remained shutdown. This is because that instance was shutdown in Test 3J. Re-starting the cluster does NOT automatically re-start cluster resources which had been previously shutdown. This is expected behavior.
[grid@racnode2 ~]$ crsctl status resource -t -------------------------------------------------------------------------------- Name Target State Server State details -------------------------------------------------------------------------------- Local Resources -------------------------------------------------------------------------------- ora.CRS.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.DATA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.FRA.dg ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.LISTENER.lsnr ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.asm ONLINE ONLINE racnode1 Started,STABLE ONLINE ONLINE racnode2 Started,STABLE ora.net1.network ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE ora.ons ONLINE ONLINE racnode1 STABLE ONLINE ONLINE racnode2 STABLE -------------------------------------------------------------------------------- Cluster Resources -------------------------------------------------------------------------------- ora.LISTENER_SCAN1.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN2.lsnr 1 ONLINE ONLINE racnode2 STABLE ora.LISTENER_SCAN3.lsnr 1 ONLINE ONLINE racnode1 STABLE ora.MGMTLSNR 1 ONLINE ONLINE racnode1 169.254.182.6 192.16 8.2.11,STABLE ora.cvu 1 OFFLINE OFFLINE STABLE ora.mgmtdb 1 ONLINE ONLINE racnode1 Open,STABLE ora.oc4j 1 ONLINE ONLINE racnode1 STABLE ora.rac1db.db 1 OFFLINE OFFLINE Instance Shutdown,ST ABLE 2 ONLINE ONLINE racnode2 Open,STABLE ora.racnode1.vip 1 ONLINE ONLINE racnode1 STABLE ora.racnode2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan1.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan2.vip 1 ONLINE ONLINE racnode2 STABLE ora.scan3.vip 1 ONLINE ONLINE racnode1 STABLE --------------------------------------------------------------------------------
Section #5: Health Checks & ORAchk.
Test #5A. Check: Cluster Health Check..
There is a -html option for the cluvfy command which writes its output in HTML format to this location by default:
/u01/app/grid/crsdata/@global/cvu/report/html/cvucheckreport.html
The HTML version is very useful because some of the references it contains are hyperlinks to Oracle’s technical support documentation. How convenient is that?
Note, running the cluvfy command with the -html option will overwrite any existing cvucheckreport.html file.
[grid@racnode1 ~]$ cluvfy comp healthcheck -collect cluster -bestpractice > cluster_hc.txt
An example of this command’s output can be viewed here. Looks like we got busted for not using Jumbo Frames!
Test #5B. Check: RAC Database Health Check..
There is a -html option for the cluvfy command which writes its output in HTML format to this location by default:
/u01/app/grid/crsdata/@global/cvu/report/html/cvucheckreport.html
The HTML version is very useful because some of the references it contains are hyperlinks to Oracle’s technical support documentation.
Note, running the cluvfy command with the -html option will overwrite any existing cvucheckreport.html file.
[grid@racnode1 ~]$ cluvfy comp healthcheck -collect database -bestpractice > database_hc.txt
An example of this command’s output can be seen here. Looks like a couple of things could do with some attention.
Test #5C. Check: Run ORAchk.
ORAchk is a utility provided by Oracle Support which analyzes your operating system setup, Oracle software installation(s) and configurations and compares them with Oracle best practices, recommendations and available patches. The objective is to identify known problem areas before they cause disruption to database services. ORAchk can only be downloaded via MOS, Doc ID 1268927.2. As the oracle software installation user (usually oracle), you unzip the ORAchk zip file, then run the orachk command with the required parameters. For example:
[oracle@racnode1 ORAchk]$ ./orachk -a -o verbose
The great thing about ORAchk is it only analyzes the things it needs to or you want it to. It’s output is saved to a time stamped directory, the main component being a large HTML report file. When opened in a browser, the report shows an overall System Health Score out of 100. For each component checked, the report provides individual test feedback in the form of a Pass, Fail, Warning or Info message. You can view the details of each test which provides recommendations and even links to technical support documentation which show how to fix the problem. In some cases, the details contain actual SQL code which will fix the problem.
The not so great thing about ORAchk is that it is very fussy and as a result does not always generate correct or accurate results. For example, some perfectly valid unalias commands in the .bash_profile of the grid user caused ORAchk to fail multiple tests and even time out running others. Code timing out is especially problematic to the extent there are two environment variables you can set to allow a longer timeout periods. These are documented in the User Guide. The values I have used which (mostly) worked are:
RAT_ROOT_TIMEOUT=800 RAT_TIMEOUT=280
Suffice it to say, ORAchk isn’t perfect but is still worth trying. Just make sure you verify its results.
Task #4: Common Administration Tasks.
This task will see us perform a few corrective actions based upon the output of the health checks and ORAchk. We’ll also complete some admin which includes some of the things you’d typically do to a cluster and RAC database. We having fun yet? 🙂
We have 9 common tasks to complete. Links below:
- Task 4.1 Enable Archiving for the RAC Database.
- Task 4.2 Add an Additional OCR File.
- Task 4.3 Add an Additional Votedisk.
- Task 4.4 Implement oswatcher.
- Task 4.5 Check I/O Calibration.
- Task 4.6 Apply a Patch.
- Task 4.7 Apply a PSU.
- Task 4.8 Add a Node to the Cluster.
- Task 4.9 Delete a Node from the Cluster.
Task #4.1: Enable Archiving for the RAC Database.
ORAchk highlighted our RAC database is not running in archivelog mode. So let’s fix that.
Task #4.1a: Review Parameters.
The normal parameters used to control the archiving of the Online Redo Logs have no effect when using ASM and a Fast Recovery Area. The parameters which do matter are these:
SQL> show parameter db_recovery_file_dest NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ db_recovery_file_dest string +FRA db_recovery_file_dest_size big integer 4785M
The parameter db_recovery_file_dest effectively trumps the traditional log_archive_dest_1 parameter and ensures the archived Online Redo Logs are stored in the Fast Recovery Area. In our case, that happens to be implemented as an ASM Diskgroup called +FRA. Also, the traditional parameter log_archive_format, which defaults to %t_%s_%r.dbf, has no effect since the archived Online Redo Log file names will follow the OMF standard naming convention.
Task #4.1b: Shutdown the RAC Database.
[oracle@racnode1 ~]$ srvctl stop database -d RAC1DB
Task #4.1c: Start One Instance.
[oracle@racnode1 ~]$ . oraenv ORACLE_SID = [RAC1DB1] ? The Oracle base remains unchanged with value /u01/app/oracle [oracle@racnode1 ~]$ sqlplus / as sysdba SQL*Plus: Release 12.1.0.2.0 Production on Tue Feb 16 21:02:46 2016 Copyright (c) 1982, 2014, Oracle. All rights reserved. Connected to an idle instance. SQL> startup mount ORACLE instance started. Total System Global Area 2147483648 bytes Fixed Size 2926472 bytes Variable Size 1275070584 bytes Database Buffers 855638016 bytes Redo Buffers 13848576 bytes Database mounted.
Task #4.1d: Enable Archiving & Shutdown the Instance.
SQL> alter database archivelog; Database altered. SQL> shutdown immediate ORA-01109: database not open Database dismounted. ORACLE instance shut down. SQL> exit
Task #4.1e: Restart the RAC Database.
[oracle@racnode1 ~]$ srvctl start database -d RAC1DB
Task #4.1f: Verify Archivelog Mode & Generate Archived Online Redo Logs.
From racnode1 (Thread #1 – RAC1DB1):
SQL> archive log list Database log mode Archive Mode Automatic archival Enabled Archive destination USE_DB_RECOVERY_FILE_DEST Oldest online log sequence 316 Next log sequence to archive 317 Current log sequence 317 SQL> alter system switch logfile; System altered. SQL> select name from v$archived_log; NAME -------------------------------------------------------------------------------- +FRA/RAC1DB/ARCHIVELOG/2016_02_16/thread_1_seq_317.261.903993279
From racnode2 (Thread #2 – RAC1DB2):
SQL> archive log list Database log mode Archive Mode Automatic archival Enabled Archive destination USE_DB_RECOVERY_FILE_DEST Oldest online log sequence 265 Next log sequence to archive 265 Current log sequence 266 SQL> alter system switch logfile; System altered. SQL> select name from v$archived_log; NAME -------------------------------------------------------------------------------- +FRA/RAC1DB/ARCHIVELOG/2016_02_16/thread_1_seq_317.261.903993279 +FRA/RAC1DB/ARCHIVELOG/2016_02_16/thread_2_seq_265.262.903993427
We can also see the archived Online Redo Logs via ASMCMD:
ASMCMD> pwd +FRA/RAC1DB/ARCHIVELOG/2016_02_16 ASMCMD> ls -l Type Redund Striped Time Sys Name ARCHIVELOG UNPROT COARSE FEB 16 21:00:00 Y thread_1_seq_317.261.903993279 ARCHIVELOG UNPROT COARSE FEB 16 21:00:00 Y thread_2_seq_265.262.903993427
Task #4.2: Add an Additional OCR File.
We can have up to 5 OCR files and the fact Oracle allows for that suggests we ought to have more than just one. So let’s fix that.
Task #4.2a: Review Current OCR Configuration.
[grid@racnode2 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 4 Total space (kbytes) : 409568 Used space (kbytes) : 1588 Available space (kbytes) : 407980 ID : 1641944593 Device/File Name : +CRS Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user
So, we know we have just one OCR file located in the +CRS ASM Diskgroup. Let’s take a look at that file using ASMCMD:
ASMCMD> pwd +crs/cluster1/OCRFILE ASMCMD> ls -l Type Redund Striped Time Sys Name OCRFILE UNPROT COARSE FEB 16 22:00:00 Y REGISTRY.255.898728137
Task #4.2b: Add an OCR Location.
I created an additional ASM Diskgroup called CRS2. The ocrconfig command must be run as root.
[root@racnode1 ~]# . oraenv ORACLE_SID = [root] ? +ASM1 The Oracle base has been set to /u01/app/grid [root@racnode1 ~]# ocrconfig -add +CRS2
Note, initial attempts to run this command repeatedly generated PROT-16 and PROC-23 errors. CRS was running fine on both nodes, the CRS log file contained no errors and the ASM instance alert log did not tell me anything of interest. A reboot of both nodes made this problem go away. Go figure. YMMV.
Task #4.2c: Review Updated OCR Configuration.
[grid@racnode1 ~]$ ocrcheck Status of Oracle Cluster Registry is as follows : Version : 4 Total space (kbytes) : 409568 Used space (kbytes) : 1604 Available space (kbytes) : 407964 ID : 1641944593 Device/File Name : +CRS Device/File integrity check succeeded Device/File Name : +CRS2 Device/File integrity check succeeded Device/File not configured Device/File not configured Device/File not configured Cluster registry integrity check succeeded Logical corruption check bypassed due to non-privileged user [grid@racnode1 ~]$ cat /etc/oracle/ocr.loc #Device/file getting replaced by device +CRS2/cluster1/OCRFILE/REGISTRY.255.904309547 ocrconfig_loc=+CRS/cluster1/OCRFILE/registry.255.898728137 ocrmirrorconfig_loc=+CRS2/cluster1/OCRFILE/REGISTRY.255.904309547 ASMCMD> pwd +crs2/cluster1/OCRFILE ASMCMD> ls -l Type Redund Striped Time Sys Name OCRFILE UNPROT COARSE FEB 20 14:00:00 Y REGISTRY.255.904309547
Task #4.3: Add an Additional Votedisk.
Oracle best practice dictates we should have an odd number of Votedisks. Yes, I know 1 is an odd number but we should probably have more than one. Let’s fix that.
Task #4.3a: Review Current Votedisk Configuration.
[grid@racnode2 ~]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE 6623c2c7b3a24fc5bffd28b93c053000 (/dev/oracleasm/disks/ASMDISK01) [CRS] Located 1 voting disk(s).
Note, there is currently only ONE Votedisk because it resides in the +CRS ASM Diskgroup which is configured with external redundancy and a single ASM Disk. Hence, only ONE Votedisk.
Task #4.3b: Add Additional Votedisks.
Since we’re using ASM to store the Votedisk, we can’t just add more Votedisks to other ASM Diskgroups. It doesn’t work that way. Instead,the Votedisk redundancy (i.e. the number of Votedisks you have) is dependent upon the redundancy level of the ASM Diskgroup where the Votedisk(s) are stored. However, there is a mismatch between ASM Diskgroup redundancy and Oracle Clusterware Votedisk redundancy.
Redundancy Level | Minimum # of ASM Disks | # of Clusterware Votedisk Files |
---|---|---|
External | 1 | 1 |
Normal | 2 | 3 |
High | 3 | 5 |
What this means is, if you create an ASM Diskgroup with NORMAL redundancy and 2 ASM Disks, trying to create Votedisks in that ASM Diskgroup would fail. This is because NORMAL redundancy for Votedisks would need 3 ASM Disks, not 2. Using a NORMAL redundancy ASM Diskgroup with 3 ASM Disks could contain Votedisks as all 3 disks would be used to store Votedisks.
To further make the point, I created a 4 disk NORMAL redundancy ASM Diskgroup called DATA2:
SQL> select dg.name, dg.type Redundancy, d.name, d.path from v$asm_disk d, v$asm_diskgroup dg where dg.group_number = d.group_number and dg.name = 'DATA2' order by d.name; NAME REDUNDANCY NAME PATH ---------- ---------- ---------- ------------------------------ DATA2 NORMAL DATA2_0000 /dev/oracleasm/disks/ASMDISK08 DATA2 NORMAL DATA2_0001 /dev/oracleasm/disks/ASMDISK09 DATA2 NORMAL DATA2_0002 /dev/oracleasm/disks/ASMDISK10 DATA2 NORMAL DATA2_0003 /dev/oracleasm/disks/ASMDISK06
Now we’ll move the Votedisk from the +CRS ASM Diskgroup to the +DATA2 ASM Diskgroup:
[grid@racnode2 ~]$ crsctl replace votedisk +DATA2 Successful addition of voting disk fd93674940d24f72bfcf07b3e1ac6d14. Successful addition of voting disk fc9912809b3f4f3fbf64add0b80b34e4. Successful addition of voting disk 5e840512f2904f41bfbc5dd468b4051b. Successful deletion of voting disk 6623c2c7b3a24fc5bffd28b93c053000. Successfully replaced voting disk group with +DATA2. CRS-4266: Voting file(s) successfully replaced
Task #4.3c: Review Updated Votedisk Configuration.
As you can see, the NORMAL redundancy setting of the ASM Diskgroup created 3 Votedisks:
[grid@racnode2 ~]$ crsctl query css votedisk ## STATE File Universal Id File Name Disk group -- ----- ----------------- --------- --------- 1. ONLINE fd93674940d24f72bfcf07b3e1ac6d14 (/dev/oracleasm/disks/ASMDISK08) [DATA2] 2. ONLINE fc9912809b3f4f3fbf64add0b80b34e4 (/dev/oracleasm/disks/ASMDISK09) [DATA2] 3. ONLINE 5e840512f2904f41bfbc5dd468b4051b (/dev/oracleasm/disks/ASMDISK10) [DATA2] Located 3 voting disk(s).
Note, only 3 of the 4 ASM Disks in the +DATA2 ASM Diskgroup are used for Votedisks.
Task #4.4: Implement oswatcher.
ORAchk highlighted our RAC nodes are not running oswatcher. Let’s fix that.
The oswatcher utility available via Oracle Support, Doc ID 301137.1. It captures and analyzes performance data from the underlying operating system. The capture aspect is documented in MOS Doc ID 1531223.1 and the analyze aspect is documented in MOS Doc ID 461053.1.
Task #4.4a: Download.
Within MOS, navigate to Doc ID 301137.1, then click the OSWatcher Download link. This will take you to the OSWatcher Download section of the document. There you’ll see another link called:
Click here to download OSWatcher for Solaris, Linux and HP-UX. (AIX users see link below)
Click that link and you’ll be prompted to download a file called oswbb733.tar (where the 733 means version 7.3.3). Once this file is downloaded, copy it to an appropriate location on ALL the nodes in your cluster.
Task #4.4b: Install.
Installing OSWatcher is simply a case of untarring the tar file, then changing the file permissions:
[oracle@racnode1 ~]$ cd ./media/oswatcher [oracle@racnode1 OSWatcher]$ tar -xvf oswbb733.tar [oracle@racnode1 oswatcher]$ ls -l drwxr-xr-x 4 oracle oinstall 4096 Feb 27 2015 oswbb -rw-r--r-- 1 oracle oinstall 6318080 Feb 21 15:10 oswbb733.tar [oracle@racnode1 oswatcher]$ cd oswbb [oracle@racnode1 oswbb]$ chmod 744 * [oracle@racnode1 oswbb]$ ls -l -rwxr--r-- 1 oracle oinstall 67 Jan 15 2014 call_du.sh -rwxr--r-- 1 oracle oinstall 68 Oct 7 2013 call_sar.sh -rwxr--r-- 1 oracle oinstall 71 Jan 7 2014 call_uptime.sh drwxr--r-- 4 oracle oinstall 4096 Sep 8 2014 docs -rwxr--r-- 1 oracle oinstall 626 Jan 15 2014 Example_extras.txt -rwxr--r-- 1 oracle oinstall 1864 Oct 7 2013 Exampleprivate.net -rwxr--r-- 1 oracle oinstall 772 May 8 2014 ifconfigsub.sh -rwxr--r-- 1 oracle oinstall 743 Oct 7 2013 iosub.sh -rwxr--r-- 1 oracle oinstall 1486 Jan 8 2014 ltop.sh -rwxr--r-- 1 oracle oinstall 542 Oct 7 2013 mpsub.sh -rwxr--r-- 1 oracle oinstall 740 Oct 7 2013 nfssub.sh -rwxr--r-- 1 oracle oinstall 5062 Sep 17 2014 OSWatcherFM.sh -rwxr--r-- 1 oracle oinstall 35108 Feb 27 2015 OSWatcher.sh -rwxr--r-- 1 oracle oinstall 233897 Feb 27 2015 oswbba.jar -rwxr--r-- 1 oracle oinstall 414 Oct 7 2013 oswib.sh -rwxr--r-- 1 oracle oinstall 435 Jan 8 2014 oswnet.sh -rwxr--r-- 1 oracle oinstall 825 Oct 7 2013 oswrds.sh -rwxr--r-- 1 oracle oinstall 524 Oct 7 2013 oswsub.sh -rwxr--r-- 1 oracle oinstall 1445 Oct 17 2013 psmemsub.sh drwxr--r-- 2 oracle oinstall 4096 May 2 2014 src -rwxr--r-- 1 oracle oinstall 2574 Feb 26 2015 startOSWbb.sh -rwxr--r-- 1 oracle oinstall 558 Apr 17 2014 stopOSWbb.sh -rwxr--r-- 1 oracle oinstall 746 Nov 6 2013 tarupfiles.sh -rwxr--r-- 1 oracle oinstall 4219 Nov 6 2013 tar_up_partial_archive.sh -rwxr--r-- 1 oracle oinstall 545 Feb 23 2015 vmsub.sh -rwxr--r-- 1 oracle oinstall 1486 Feb 26 2015 xtop.sh
User guides and README files are in the docs subdirectory.
Task #4.4c: Configure.
OSWatcher basically runs a series of OS monitoring commands at a predetermined frequency, stores the output of those commands for a predetermined amount of time in a predetermined location.
The defaults are a frequency of 30 seconds with output data kept for the last 48 hours in an archive directory where you untarred the software. These can all be customized.
In addition, Oracle Support recommends you use OSWatcher to monitor Grid Infrastructure’s private interconnect networks. This is done by creating a special file called private.net with the appropriate entries:
[oracle@racnode1 oswbb]$ cat private.net ###################################################################### #Linux Example ###################################################################### echo "zzz ***"`date` traceroute -r -F racnode1-priv traceroute -r -F racnode2-priv ###################################################################### # DO NOT DELETE THE FOLLOWING LINE!!!!!!!!!!!!!!!!!!!!! ###################################################################### rm locks/lock.file
Task #4.4d: Run OSWatcher.
The script which starts OSWatcher is called startOSWbb.sh. It can take up to 4 parameters:
Parameter Position | Meaning |
---|---|
1 | Sample Frequency (Seconds) |
2 | Archive History (Hours) |
3 | OS Compression Utility to Use |
4 | Archive Directory Path |
For example, to capture stats once per minute, keep those stats for 12 hours, use gzip to compress the archive data and store the archive data in /nas1/OSWATCHER/racnode1, use this command:
[oracle@racnode1 oswbb]$ ./startOSWbb.sh 60 12 gzip /nas1/OSWATCHER/racnode1 Info...Zip option IS specified. Info...OSW will use gzip to compress files. Testing for discovery of OS Utilities... VMSTAT found on your system. IOSTAT found on your system. MPSTAT found on your system. IFCONFIG found on your system. NETSTAT found on your system. TOP found on your system. Warning... /proc/slabinfo not found on your system. Testing for discovery of OS CPU COUNT oswbb is looking for the CPU COUNT on your system CPU COUNT will be used by oswbba to automatically look for cpu problems CPU COUNT found on your system. CPU COUNT = 1 Discovery completed. Starting OSWatcher v7.3.3 on Sun Feb 21 17:32:10 CST 2016 With SnapshotInterval = 60 With ArchiveInterval = 12 OSWatcher - Written by Carl Davis, Center of Expertise, Oracle Corporation For questions on install/usage please go to MOS (Note:301137.1) If you need further assistance or have comments or enhancement requests you can email me Carl.Davis@Oracle.com Data is stored in directory: /nas1/OSWATCHER/racnode1 Starting Data Collection...
Once OSWatcher is underway, it will echo heartbeat information to your screen at the frequency you specified (in our case every 60 seconds):
oswbb heartbeat:Sun Feb 21 17:40:16 CST 2016
It’s OK to log out of the session which initiated OSWatcher. The processes continue to run in the background:
[oracle@racnode1 ~]$ ps -ef | grep -v grep | grep OSW oracle 27722 1 0 17:31 ? 00:00:00 /bin/sh ./OSWatcher.sh 60 12 gzip /nas1/OSWATCHER/racnode1 oracle 27906 27722 0 17:32 ? 00:00:00 /bin/sh ./OSWatcherFM.sh 12 /nas1/OSWATCHER/racnode1
Task #4.4e: Analyze.
Analyzing the OSWatcher data might be a topic of a future update. You can’t have everything all in one go.
Task #4.5: Check I/O Calibration.
The RAC Database Health Check reported this failure:
Verification Check : FILESYSTEMIO_OPTIONS Verification Description : Checks FILESYSTEMIO_OPTIONS parameter Verification Result : WARNING Verification Summary : Check for FILESYSTEMIO_OPTIONS failed Additional Details : FILESYSTEMIO_OPTIONS=setall supports both direct I/O and asynchronus I/O which helps to achieve optimal performance with database data files Database(Instance) Status Expected Value Actual Value --------------------------------------------------------------------------------------- rac1db FAILED filesystemio_options = SETALL filesystemio_options = none
We know our RAC database is using ASM for its storage which should optimize I/O for us. One of the prerequisites of running Oracle’s I/O Calibration is to ensure asynchronous I/O is enabled for all the database datafiles, so let’s check that now:
SQL> SELECT NAME, ASYNCH_IO FROM V$DATAFILE F, V$IOSTAT_FILE I WHERE F.FILE#=I.FILE_NO AND FILETYPE_NAME='Data File'; NAME ASYNCH_IO -------------------------------------------------- --------- +DATA/RAC1DB/DATAFILE/system.258.898779945 ASYNC_ON +DATA/RAC1DB/DATAFILE/sysaux.257.898779761 ASYNC_ON +DATA/RAC1DB/DATAFILE/undotbs1.260.898780207 ASYNC_ON +DATA/RAC1DB/DATAFILE/undotbs2.265.898783783 ASYNC_ON +DATA/RAC1DB/DATAFILE/users.259.898780201 ASYNC_ON
Asynchronous I/O seems to be in play. We know all these files are in the +DATA ASM Diskgroup, which contains 2 ASM Disks:
SQL> select dg.name DG, d.name Disk, decode(d.GROUP_NUMBER, 0,'Unallocated', 'In Use') State, d.path from v$asm_disk d left outer join v$asm_diskgroup dg on dg.group_number = d.group_number order by dg.name, d.path; DG DISK STATE PATH ---------- ------------ ----------- ------------------------------ CRS CRS_0000 In Use /dev/oracleasm/disks/ASMDISK01 CRS2 CRS2_0000 In Use /dev/oracleasm/disks/ASMDISK07 DATA DATA_0000 In Use /dev/oracleasm/disks/ASMDISK02 DATA DATA_0001 In Use /dev/oracleasm/disks/ASMDISK03 DATA2 DATA2_0003 In Use /dev/oracleasm/disks/ASMDISK06 DATA2 DATA2_0000 In Use /dev/oracleasm/disks/ASMDISK08 DATA2 DATA2_0001 In Use /dev/oracleasm/disks/ASMDISK09 DATA2 DATA2_0002 In Use /dev/oracleasm/disks/ASMDISK10 FRA FRA_0000 In Use /dev/oracleasm/disks/ASMDISK04 FRA FRA_0001 In Use /dev/oracleasm/disks/ASMDISK05
I/O calibration is discussed in the Oracle Database Performance Tuning Guide and can be reviewed here. Let’s run the calibration code multiple times to see what results we get with FILESYSTEMIO_OPTIONS set to NONE and SETALL. Here’s the calibration code:
SET SERVEROUTPUT ON DECLARE lat INTEGER; iops INTEGER; mbps INTEGER; BEGIN -- DBMS_RESOURCE_MANAGER.CALIBRATE_IO (<DISKS>, <MAX_LATENCY>, iops, mbps, lat); DBMS_RESOURCE_MANAGER.CALIBRATE_IO (2, 10, iops, mbps, lat); DBMS_OUTPUT.PUT_LINE ('max_iops = ' || iops); DBMS_OUTPUT.PUT_LINE ('latency = ' || lat); dbms_output.put_line ('max_mbps = ' || mbps); end; /
Having run the code multiple times with FILESYSTEMIO_OPTIONS set to NONE, let’s change it to SETALL. It’s not a dynamic parameter so we’ll need to bounce the database:
SQL> show parameter filesystemio_options NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ filesystemio_options string none SQL> alter system set filesystemio_options = SETALL scope=spfile; System altered. [oracle@racnode1 racnode1]$ srvctl stop database -d RAC1DB [oracle@racnode1 racnode1]$ srvctl start database -d RAC1DB [oracle@racnode1 racnode1]$ srvctl status database -d RAC1DB Instance RAC1DB1 is running on node racnode1 Instance RAC1DB2 is running on node racnode2 SQL> show parameter filesystemio_options NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ filesystemio_options string SETALL
These are the results averaged for multiple runs of the code:
FILESYSTEMIO_OPTIONS | Max IOPS | Latency | Max MBPS |
---|---|---|---|
NONE | 9,631 | 9.67 | 105.67 |
SETALL | 9,796 | 8 | 105.67 |
As you can see, the Max IOPS improved slightly with SETALL which also performed with slightly lower latency. That said, the two sets of results are so close that it makes little difference. However, things didn’t get worse by setting FILESYSTEMIO_OPTIONS to SETALL, so let’s call that a win. Sort of.
Task #4.6: Apply a Patch (22191349).
ORAchk recommended applying the Grid Infrastructure Patch Set Update 12.1.0.2.160119 (JAN2016). There are multiple steps involved, so let’s get started.
Note, this is a rolling patch. It can be applied to a running system. Clusterware will be shutdown and patched as you apply the patch one node at a time.
Task #4.6a: Update opatch.
To apply this patch, the opatch utility must be version 12.1.0.1.7 or later. So let’s check the version in the GI and database $ORACLE_HOME directories:
[grid@racnode1 OPatch]$ ./opatch version OPatch Version: 12.1.0.1.3 OPatch succeeded. [oracle@racnode1 OPatch]$ ./opatch version OPatch Version: 12.1.0.1.3 OPatch succeeded.
We need to update opatch which is done by downloading the latest version from MOS, patch #6880880.
We’ll update the GI and database $ORACLE_HOME directories on racnode1 first, taking a backup of the original OPatch directories as we go.
[grid@racnode1 grid]$ pwd /u01/app/12.1.0/grid [grid@racnode1 grid]$ mv OPatch OPatch_12.1.0.1.3 mv: cannot move `OPatch' to `OPatch_12.1.0.1.3': Permission denied
The directory is owned by root:oinstall and group does not have write permission. Let’s fix that:
[root@racnode1 12.1.0]# pwd /u01/app/12.1.0 [root@racnode1 12.1.0]# ls -la drwxr-xr-x 3 root oinstall 4096 Dec 17 18:09 . drwxr-xr-x 6 root oinstall 4096 Dec 17 13:07 .. drwxr-xr-x 73 root oinstall 4096 Dec 17 22:34 grid [root@racnode1 12.1.0]# chmod 775 grid [grid@racnode1 grid]$ mv OPatch OPatch_12.1.0.1.3
Now copy the new opatch zip file into the GI home and unzip it:
[grid@racnode1 grid]$ cp /nas1/PATCHES/6880880/p6880880_121010_Linux-x86-64.zip . [grid@racnode1 grid]$ unzip p6880880_121010_Linux-x86-64.zip
Check the new opatch version then remove the zip file:
[grid@racnode1 OPatch]$ ./opatch version OPatch Version: 12.1.0.1.10 OPatch succeeded. [grid@racnode1 OPatch]$ cd .. [grid@racnode1 grid]$ ls -l *.zip -rw-r--r-- 1 grid oinstall 52774823 Feb 29 11:48 p6880880_121010_Linux-x86-64.zip [grid@racnode1 grid]$ rm -f p6880880_121010_Linux-x86-64.zip
Note, don’t forget to change the GI $ORACLE_HOME permissions back to 755.
Now update opatch in the database $ORACLE_HOME:
[oracle@racnode1 dbhome_1]$ mv OPatch OPatch_12.1.0.1.3 [oracle@racnode1 dbhome_1]$ cp /nas1/PATCHES/6880880/p6880880_121010_Linux-x86-64.zip . [oracle@racnode1 dbhome_1]$ unzip p6880880_121010_Linux-x86-64.zip [oracle@racnode1 dbhome_1]$ cd OPatch [oracle@racnode1 OPatch]$ ./opatch version OPatch Version: 12.1.0.1.10 OPatch succeeded. [oracle@racnode1 dbhome_1]$ cd .. [oracle@racnode1 dbhome_1]$ ls -l *.zip -rw-r--r-- 1 oracle oinstall 52774823 Feb 29 11:59 p6880880_121010_Linux-x86-64.zip [oracle@racnode1 dbhome_1]$ rm -f p6880880_121010_Linux-x86-64.zip
Now repeat the same tasks for the GI and database $ORACLE_HOME directories on racnode2.
Task #4.6b: Create an Oracle Configuration Manager (OCM) Response File.
You must have an OCM response file with file permissions of 775.
[grid@racnode1 bin]$ pwd
/u01/app/12.1.0/grid/OPatch/ocm/bin
[grid@racnode1 bin]$ ./emocmrsp -no_banner -output /u01/app/12.1.0/grid/OPatch/ocm/ocm.rsp
Provide your email address to be informed of security issues, install and
initiate Oracle Configuration Manager. Easier for you if you use your My
Oracle Support Email address/User Name.
Visit http://www.oracle.com/support/policies.html for details.
Email address/User Name:
You have not provided an email address for notification of security issues.
Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Y
The OCM configuration response file (/u01/app/12.1.0/grid/OPatch/ocm/ocm.rsp) was
successfully created.
[grid@racnode1 bin]$ cd ..
[grid@racnode1 ocm]$ chmod 775 ocm.rsp
[grid@racnode1 ocm]$ ls -l
drwxr-x--- 2 grid oinstall 4096 Nov 13 14:15 bin
drwxr-x--- 2 grid oinstall 4096 Nov 13 14:15 doc
drwxr-x--- 2 grid oinstall 4096 Nov 13 14:15 lib
-rw-r----- 1 grid oinstall 638 Nov 13 14:15 ocm_platforms.txt
-rwxrwxr-x 1 grid oinstall 621 Feb 29 12:44 ocm.rsp
-rwxr-x--- 1 grid oinstall 42560960 Nov 13 14:15 ocm.zip
Repeat this step for the GI $ORACLE_HOME on racnode2.
Task #4.6c: Check the $ORACLE_HOME Inventories.
On racnode1, we need to check the GI and database $ORACLE_HOME inventories.
[grid@racnode1 OPatch]$ ./opatch lsinventory -detail -oh /u01/app/12.1.0/grid
The output is quite verbose, but can be viewed here.
[oracle@racnode1 OPatch]$ ./opatch lsinventory -detail -oh /u01/app/oracle/product/12.1.0/dbhome_1
Again, the output is quite verbose, but it can be viewed here.
Checking the inventories tells us there are no interim patches applied to either the GI or database $ORACLE_HOME directories.
Task #4.6d: Unzip the Patch.
The patch needs to be unzipped and accessible to all nodes in the cluster. It makes it easier that way. Fortunately I have some NAS storage which is mounted on both RAC nodes.
[grid@racnode1 grid]$ cd /nas1/PATCHES/22191349/grid [grid@racnode1 grid]$ unzip p22191349_121020_Linux-x86-64.zip
Task #4.6e: Check For One-Off Patch Conflicts.
We already know that neither $ORACLE_HOME directories contain any patches at this point, so there can’t be any conflicts by definition. Still, we’ll run the conflict check anyway which must be done as the root user:
[root@racnode1 grid]# pwd /nas1/PATCHES/22191349/grid [root@racnode1 grid]# export PATH=$PATH:/u01/app/12.1.0/grid/OPatch [root@racnode1 grid]# which opatchauto /u01/app/12.1.0/grid/OPatch/opatchauto [root@racnode1 grid]# /u01/app/12.1.0/grid/OPatch/opatchauto apply /nas1/PATCHES/22191349/grid/22191349 -analyze -ocmrf /u01/app/12.1.0/grid/OPatch/ocm/ocm.rsp OPatch Automation Tool Copyright (c)2014, Oracle Corporation. All rights reserved. OPatchauto Version : 12.1.0.1.10 OUI Version : 12.1.0.2.0 Running from : /u01/app/12.1.0/grid opatchauto log file: /u01/app/12.1.0/grid/cfgtoollogs/opatchauto/22191349/ opatch_gi_2016-02-29_14-02-15_analyze.log NOTE: opatchauto is running in ANALYZE mode. There will be no change to your system. OCM RSP file has been ignored in analyze mode. Parameter Validation: Successful Configuration Validation: Successful Patch Location: /nas1/PATCHES/22191349/grid/22191349 Grid Infrastructure Patch(es): 21436941 21948341 21948344 21948354 DB Patch(es): 21948344 21948354 Patch Validation: Successful Grid Infrastructure home: /u01/app/12.1.0/grid DB home(s): /u01/app/oracle/product/12.1.0/dbhome_1 Analyzing patch(es) on "/u01/app/oracle/product/12.1.0/dbhome_1" ... Patch "/nas1/PATCHES/22191349/grid/22191349/21948344" successfully analyzed on "/u01/app/oracle/product/12.1.0/dbhome_1" for apply. Patch "/nas1/PATCHES/22191349/grid/22191349/21948354" successfully analyzed on "/u01/app/oracle/product/12.1.0/dbhome_1" for apply. Analyzing patch(es) on "/u01/app/12.1.0/grid" ... Patch "/nas1/PATCHES/22191349/grid/22191349/21436941" successfully analyzed on "/u01/app/12.1.0/grid" for apply. Patch "/nas1/PATCHES/22191349/grid/22191349/21948341" successfully analyzed on "/u01/app/12.1.0/grid" for apply. Patch "/nas1/PATCHES/22191349/grid/22191349/21948344" successfully analyzed on "/u01/app/12.1.0/grid" for apply. Patch "/nas1/PATCHES/22191349/grid/22191349/21948354" successfully analyzed on "/u01/app/12.1.0/grid" for apply. SQL changes, if any, are analyzed successfully on the following database(s): RAC1DB Apply Summary: Following patch(es) are successfully analyzed: GI Home: /u01/app/12.1.0/grid: 21436941,21948341,21948344,21948354 DB Home: /u01/app/oracle/product/12.1.0/dbhome_1: 21948344,21948354 opatchauto succeeded.
As we suspected, no conflicts so we can proceed and apply the patch.
Task #4.6f: Apply the Patch.
There are some options for applying the patch. We’ll choose the ‘non-shared GI and database $ORACLE_HOME and ACFS is not configured’ option. Each node has to be patched separately on a running system, i.e. the cluster is up. Let’s patch racnode1 first and enjoy the nightmare together. 🙂
The patch is applied by the root user using this command template:
<GI_HOME>/OPatch/opatchauto apply <UNZIPPED_PATCH_LOCATION>/22191349 -ocmrf <ocm response file>
In our case, that translates into this:
[root@racnode1 ~]# /u01/app/12.1.0/grid/OPatch/opatchauto apply /nas1/PATCHES/22191349/grid/22191349 -ocmrf /u01/app/12.1.0/grid/OPatch/ocm/ocm.rsp
This had the effect of installing patches 21948344 and 21948354 into the database $ORACLE_HOME, but it failed to install any patches into the GI $ORACLE_HOME. The reason was there was insufficient disk space. The patch log file contained these messages:
... [Feb 29, 2016 2:27:40 PM] Running prerequisite checks... [Feb 29, 2016 2:27:41 PM] Space Needed : 8180.049MB [Feb 29, 2016 2:27:41 PM] Space Usable : 7786.043MB [Feb 29, 2016 2:27:41 PM] Required amount of space(8180.049MB) is not available. [Feb 29, 2016 2:27:41 PM] Prerequisite check "CheckSystemSpace" failed. The details are: Required amount of space(8180.049MB) is not available. ...
The /u01 file system did not have the 8 GB of free disk it needed. It would have been nice if the patch conflict command had told me this. Thanks Oracle! As is customary in patch README files, the command to rollback the patch is also documented. However, attempts to rollback the patch also failed because Clusterware wasn’t running and my attempts to re-start it on racnode1 repeatedly failed. A node reboot solved that problem and I tried to rollback the patch for a second time. That failed as well. This time the rollback command reported these errors in the log file:
COMMAND EXECUTION FAILURE : Using configuration parameter file: /u01/app/12.1.0/grid/OPatch/opatchautotemp_racnode1/ patchwork/crs/install/crsconfig_params Oracle Clusterware active version on the cluster is [12.1.0.2.0]. The cluster upgrade state is [ROLLING PATCH]. The cluster active patch level is [0]. PROC-28: Oracle Cluster Registry already in current version CRS-1153: There was an error setting Oracle Clusterware to rolling patch mode. CRS-4000: Command Start failed, or completed with errors. 2016/02/29 16:01:23 CLSRSC-430: Failed to start rolling patch mode ERROR: Died at /u01/app/12.1.0/grid/OPatch/opatchautotemp_racnode1/patchwork/crs/ install/crspatch.pm line 776.
Clusterware was already in ‘rolling patch mode’, so you’d think trying to set it to that mode wouldn’t be a problem, right? Wrong! Unfortunately, this is an unpublished pub. Fortunately, Oracle already knows about this and has a workaround documented in MOS, Doc ID 1943498.1. It boils down to running this command:
[root@racnode1 ~]# crsctl stop rollingpatch
This command appears to switch Clusterware back into NORMAL mode, so it can be switched into ROLLING PATCH mode when you attempt to rollback the patch or apply it. The log file also contains this rather useful tidbit of information:
--------------After fixing the cause of failure you have two options shown below: Run 'opatchauto resume' or Manually run the commands listed below --------------------------------------------------------------------------------- ...
I wasn’t about to trust ‘opatchauto resume’, so I opted for running the 17 fully documented commands in sequential order. Some needed to be run as root, some as oracle and some as grid. Unbelievably, they all worked! We can see the patches which were applied by running opatch lsinventory commands:
[grid@racnode1 OPatch]$ ./opatch lsinventory
The output can be viewed here.
[oracle@racnode1 OPatch]$ ./opatch lsinventory
The output can be viewed here.
Note, opatch stores patch rollback information and files in a sub-directory within each $ORACLE_HOME called .patch_storage. This has the potential to consume a large quantity of disk space and thus reduce the free space available to the filesystem which contains the Oracle code installations. Running out of disk space in this filesystem is bad news. You should NOT just delete/archive the .patch_storage directory, so Oracle recommends one of 3 courses of action in the event disk space becomes an issue:
- Increase the size of the filesystem. (I know – genius right? 😂)
- Use symbolic links and relocate .patch_storage some place else. (This is what I did – worked fine)
- Run the command: opatch util cleanup. (Documented in MOS Doc ID 550522.1)
So, with the $ORACLE_HOME directories patched on racnode1, it’s time to attempt applying the patch on racnode2. Yes, I did check I had more than 8 GB of free space in the /u01 filesystem before I started. Thanks for asking. When applying the patch actually works as intended, it looks like this:
[root@racnode2 ~]# . oraenv ORACLE_SID = [+ASM2] ? The Oracle base remains unchanged with value /u01/app/grid [root@racnode2 trace]# export PATH=$PATH:/u01/app/12.1.0/grid/OPatch [root@racnode2 trace]# which opatchauto /u01/app/12.1.0/grid/OPatch/opatchauto [root@racnode2 ~]# /u01/app/12.1.0/grid/OPatch/opatchauto apply /nas1/PATCHES/22191349/grid/22191349 -ocmrf /u01/app/12.1.0/grid/OPatch/ocm/ocm.rsp OPatch Automation Tool Copyright (c)2014, Oracle Corporation. All rights reserved. OPatchauto Version : 12.1.0.1.10 OUI Version : 12.1.0.2.0 Running from : /u01/app/12.1.0/grid opatchauto log file : /u01/app/12.1.0/grid/cfgtoollogs/opatchauto/22191349/ opatch_gi_2016-03-01_13-17-42_deploy.log Parameter Validation: Successful Configuration Validation: Successful Patch Location: /nas1/PATCHES/22191349/grid/22191349 Grid Infrastructure Patch(es): 21436941 21948341 21948344 21948354 DB Patch(es): 21948344 21948354 Patch Validation: Successful Grid Infrastructure home: /u01/app/12.1.0/grid DB home(s): /u01/app/oracle/product/12.1.0/dbhome_1 Performing prepatch operations on CRS Home... Successful Applying patch(es) to "/u01/app/oracle/product/12.1.0/dbhome_1" ... Patch "/nas1/PATCHES/22191349/grid/22191349/21948344" successfully applied to "/u01/app/oracle/product/12.1.0/dbhome_1". Patch "/nas1/PATCHES/22191349/grid/22191349/21948354" successfully applied to "/u01/app/oracle/product/12.1.0/dbhome_1". Applying patch(es) to "/u01/app/12.1.0/grid" ... Patch "/nas1/PATCHES/22191349/grid/22191349/21436941" successfully applied to "/u01/app/12.1.0/grid". Patch "/nas1/PATCHES/22191349/grid/22191349/21948341" successfully applied to "/u01/app/12.1.0/grid". Patch "/nas1/PATCHES/22191349/grid/22191349/21948344" successfully applied to "/u01/app/12.1.0/grid". Patch "/nas1/PATCHES/22191349/grid/22191349/21948354" successfully applied to "/u01/app/12.1.0/grid". Performing postpatch operations on CRS Home... Successful SQL changes, if any, are applied successfully on the following database(s): RAC1DB Apply Summary: Following patch(es) are successfully installed: GI Home: /u01/app/12.1.0/grid: 21436941,21948341,21948344,21948354 DB Home: /u01/app/oracle/product/12.1.0/dbhome_1: 21948344,21948354 opatchauto succeeded.
Again, running the opatch lsinventory command in each $ORACLE_HOME will verify the patches have been applied.
Task #4.6g: Run datapatch.
The final step involves running a script called datapatch which is located in the OPatch directory under $ORACLE_HOME. The documentation says for a RAC environment, run this from only one node. We’re left to assume we do this after all the nodes have been patched. There are instructions for both a “Standalone DB” and “Single/Multitenant (CDB/PDB) DB”. We know our user RAC database is not a CDB, but the GIMR database (_MGMTDB) is. Since the GI $ORACLE_HOME was patched and that codeset runs the _MGMTDB database, again we have to assume we need to run the datapatch script for that database as well. It would be nice if Oracle’s documentation made that clear.
The RAC1DB instances are already up and running, so we’ll dive straight in and run the datapatch script from racnode1:
[oracle@racnode1 OPatch]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/OPatch [oracle@racnode1 OPatch]$ ./datapatch -verbose SQL Patching tool version 12.1.0.2.0 on Wed Mar 2 10:16:57 2016 Copyright (c) 2015, Oracle. All rights reserved. Log file for this invocation: /u01/app/oracle/cfgtoollogs/sqlpatch/ sqlpatch_2324_2016_03_02_10_16_57/sqlpatch_invocation.log Connecting to database...OK Bootstrapping registry and package to current versions...done Determining current state...done Current state of SQL patches: Bundle series PSU: ID 160119 in the binary registry and ID 160119 in the SQL registry Adding patches to installation queue and performing prereq checks... Installation queue: Nothing to roll back Nothing to apply SQL Patching tool complete on Wed Mar 2 10:19:11 2016
Finally, we need to verify everything worked:
SQL> select patch_id, action, status from dba_registry_sqlpatch; PATCH_ID ACTION STATUS --------- --------------- --------------- 21948354 APPLY SUCCESS
Next, we’ll run the datapatch script in the _MGMTDB database. The script requires that all pluggable databases be open, so let’s double check that before we run the script:
[grid@racnode1 ~]$ . oraenv ORACLE_SID = [-MGMTDB] ? The Oracle base remains unchanged with value /u01/app/grid [grid@racnode1 ~]$ sqlplus / as sysdba SQL> select name, open_mode from v$pdbs; NAME OPEN_MODE ------------------------------ ---------- PDB$SEED READ ONLY CLUSTER1 READ WRITE [grid@racnode1 ~]$ cd $ORACLE_HOME/OPatch [grid@racnode1 OPatch]$ pwd /u01/app/12.1.0/grid/OPatch [grid@racnode1 OPatch]$ ./datapatch -verbose SQL Patching tool version 12.1.0.2.0 on Wed Mar 2 10:48:36 2016 Copyright (c) 2015, Oracle. All rights reserved. Log file for this invocation: /u01/app/grid/cfgtoollogs/sqlpatch/ sqlpatch_14626_2016_03_02_10_48_36/sqlpatch_invocation.log Connecting to database...OK Note: Datapatch will only apply or rollback SQL fixes for PDBs that are in an open state, no patches will be applied to closed PDBs. Please refer to Note: Datapatch: Database 12c Post Patch SQL Automation (Doc ID 1585822.1) Bootstrapping registry and package to current versions...done Determining current state...done Current state of SQL patches: Bundle series PSU: ID 160119 in the binary registry and ID 160119 in PDB CDB$ROOT, ID 160119 in PDB PDB$SEED, ID 160119 in PDB CLUSTER1 Adding patches to installation queue and performing prereq checks... Installation queue: For the following PDBs: CDB$ROOT PDB$SEED CLUSTER1 Nothing to roll back Nothing to apply SQL Patching tool complete on Wed Mar 2 10:50:57 2016
Finally, we need to verify everything worked:
SQL> select patch_id, action, status from dba_registry_sqlpatch; PATCH_ID ACTION STATUS -------- --------------- --------------- 21948354 APPLY SUCCESS
All appears to be well. Job done!
Task #4.7: Apply a PSU (January 2016 PSU – Patch 22191676).
During the writing of Part 11, the January 2016 PSUs were released by Oracle. These are cumulative and bundle together many patches which were released previously. As always, we read the README file which comes with the patch very carefully to see what we’re getting into.
Note, this is a non-rolling patch and requires downtime to apply it.
The January 2016 PSU contains these patches:
Patch # | Description | Applicable Homes |
---|---|---|
21948354 | DB PSU 12.1.0.2.160119 | Both DB Homes and Grid Home |
21948344 | OCW PSU 12.1.0.2.160119 | Both DB Homes and Grid Home |
22139226 | OJVM 12.1.0.2.160119 | Only DB Homes |
21948341 | ACFS PSU 12.1.0.2.160119 | Only Grid Home |
21436941 | DBWLM PSU 12.1.0.2.5 | Only Grid Home |
Since we’ve already applied Patch #22191349, we can check to see if there’s any overlap. These are the results of running opatch lsinventory for the GI and database $ORACLE_HOME directories:
Patch # | Applied to Homes |
---|---|
21948354 | DB & GI |
21948344 | DB & GI |
21948341 | GI |
21436941 | GI |
So there is a significant overlap with only Patch #22139226 adding anything new. We could apply just that one patch or we could test the application of the whole patch to make sure it’s smart enough to skip already applied patches and apply just the one we want (#22139226). So let’s do that. We’ll follow the same approach as before, even though we know some of the ground work has already been done.
Task #4.7a: Update opatch.
We must use opatch version 12.1.0.1.7 or later. We already have version 12.1.0.1.10.
Task #4.7b: Create an Oracle Configuration Manager (OCM) Response File.
Since things have changed, we’ll go ahead and re-generate the ocm.rsp file. Just for fun, I ran the diff command to compare the old ocm.rsp file and the new one and it reported they were different. So we’ll use a new one just to be on the safe side.
[grid@racnode1 bin]$ pwd /u01/app/12.1.0/grid/OPatch/ocm/bin [grid@racnode1 bin]$ ./emocmrsp -no_banner -output /u01/app/12.1.0/grid/OPatch/ocm/ocm.rsp Provide your email address to be informed of security issues, install and initiate Oracle Configuration Manager. Easier for you if you use your My Oracle Support Email address/User Name. Visit http://www.oracle.com/support/policies.html for details. Email address/User Name: <just press Enter here> You have not provided an email address for notification of security issues. Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Y The OCM configuration response file (/u01/app/12.1.0/grid/OPatch/ocm/ocm.rsp) was successfully created. [grid@racnode1 bin]$ cd .. [grid@racnode1 ocm]$ chmod 775 ocm.rsp
Repeat this step on racnode2.
Task #4.7c: Check the $ORACLE_HOME Inventories.
As before, the point of this step is to generate documentation of what the inventories looked like before the patch was applied. The documentation also says if this command fails, you should contact Oracle Support so presumably a successful run must mean the inventories are healthy.
Task #4.7d: Unzip the Patch.
[grid@racnode1 ~]$ cd /nas1/PATCHES/22191676/grid [grid@racnode1 grid]$ unzip p22191676_121020_Linux-x86-64.zip
Task #4.7e: Check For One-Off Patch Conflicts.
This is where the patch software should detect the presence of the existing patches and report that they’ll be skipped. Thus leaving only Patch #22191349 to be applied to the database $ORACLE_HOME. Here’s the command to check for conflicts:
[root@racnode1 ~]# /u01/app/12.1.0/grid/OPatch/opatchauto apply /nas1/PATCHES/22191676/grid/22191676 -analyze -ocmrf /u01/app/12.1.0/grid/OPatch/ocm/ocm.rsp
So what did this command report? Well, the main opatchauto log file reported this:
... The following patch(es) are duplicate patches with patches installed in the Oracle Home. [ 21948354] You have already installed same patch(es) with same UPI(s) or same version(s). These patch(es) will be skipped. Opatchauto skipped installing the above patch in /u01/app/12.1.0/grid ...
The GI $ORACLE_HOME log file (referenced in the main opatchauto log file) reported this:
... The following patch(es) are duplicate patches with patches installed in the Oracle Home. [ 21436941 21948341 21948344] You have already installed same patch(es) with same UPI(s) or same version(s). These patch(es) will be skipped. ...
So, for the GI $ORACLE_HOME, Patches #21948354, #21948344, #21948341 and #21436941 will all be skipped. This is correct!
The main opatchauto log file also reported this:
... The following patch(es) are duplicate patches with patches installed in the Oracle Home. [ 21948354] You have already installed same patch(es) with same UPI(s) or same version(s). These patch(es) will be skipped. Opatchauto skipped installing the above patch in /u01/app/oracle/product/12.1.0/dbhome_1 ...
The database $ORACLE_HOME log file (referenced in the main opatchauto log file) reported this:
... Patches to apply -> [ 21948344 22139226 ] Duplicate patches to filter -> [ 21948344 ] The following patches are duplicate and are skipped: [ 21948344 ] OPatch found that the following patches are not required. All of the constituents of the composite patch [ 21948354 ] are already installed in the Oracle Home. ...
So, for the database $ORACLE_HOME, Patches #21948354 and 21948344 will be skipped leaving Patch #22139226 to be applied. Again, this is correct!
Task #4.7f: Apply the Patch.
Since we know there are no patches to apply to the GI $ORACLE_HOME, we can choose the option to only apply the patch to the database $ORACLE_HOME. The command template for that is:
opatchauto apply <UNZIPPED_PATCH_LOCATION>/22191676 -oh <oracle_home1_path>,<oracle_home2_path> -ocmrf <ocm response file>
Since we’ll be using the opatchauto command from the database $ORACLE_HOME, we should probably generate an OCM response file there:
[oracle@racnode1 bin]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/ocm/bin [oracle@racnode1 bin]$ ./emocmrsp -no_banner -output /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/ocm/ocm.rsp Provide your email address to be informed of security issues, install and initiate Oracle Configuration Manager. Easier for you if you use your My Oracle Support Email address/User Name. Visit http://www.oracle.com/support/policies.html for details. Email address/User Name: <just press Enter here> You have not provided an email address for notification of security issues. Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Y The OCM configuration response file (/u01/app/oracle/product/12.1.0/dbhome_1/ OPatch/ocm/ocm.rsp) was successfully created. [oracle@racnode1 bin]$ cd .. [oracle@racnode1 ocm]$ chmod 775 ocm.rsp
First, we need to stop the RAC database:
[oracle@racnode1 ~]$ . oraenv ORACLE_SID = [RAC1DB1] ? The Oracle base remains unchanged with value /u01/app/oracle [oracle@racnode1 ~]$ srvctl stop database -d RAC1DB
Next, stop Clusterware on the remote node (racnode2):
[root@racnode2 ~]# . oraenv ORACLE_SID = [+ASM2] ? The Oracle base remains unchanged with value /u01/app/grid [root@racnode2 ~]# crsctl stop crs
Next, ensure Clusterware is still running on the local node (racnode1):
[root@racnode1 ~]# . oraenv ORACLE_SID = [+ASM1] ? The Oracle base remains unchanged with value /u01/app/grid [root@racnode1 ~]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
Next, run the opatchauto command according to the template referenced earlier:
[root@racnode1 ~]# opatchauto apply /nas1/PATCHES/22191676/grid/22191676 -oh /u01/app/oracle/product/12.1.0/dbhome_1 -ocmrf /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/ocm/ocm.rsp
The screen output can be viewed here. Overall, it says Patch #22139226 was applied, but that some SQL was not run because the RAC database was down. The good news is the SQL commands/scripts which were not executed are documented in the log file. All they do is query the existence of the SPFILE, startup an instance in upgrade mode, set CLUSTER_DATABASE=FALSE, run the datapatch script (in the $ORACLE_HOME/OPatch directory), then set CLUSTER_DATABASE=TRUE. We’ll apply the patch to racnode2, then run these SQL commands/scripts.
Following the same sequence of steps for racnode2, first generate a new OCM response file:
[oracle@racnode2 bin]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/ocm/bin [oracle@racnode2 bin]$ ./emocmrsp -no_banner -output /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/ocm/ocm.rsp Provide your email address to be informed of security issues, install and initiate Oracle Configuration Manager. Easier for you if you use your My Oracle Support Email address/User Name. Visit http://www.oracle.com/support/policies.html for details. Email address/User Name: <just press Enter here> You have not provided an email address for notification of security issues. Do you wish to remain uninformed of security issues ([Y]es, [N]o) [N]: Y The OCM configuration response file (/u01/app/oracle/product/12.1.0/dbhome_1/ OPatch/ocm/ocm.rsp) was successfully created. [oracle@racnode2 bin]$ cd .. [oracle@racnode2 ocm]$ chmod 775 ocm.rsp
Next, re-start Clusterware on racnode2:
[root@racnode2 22191676]# . oraenv ORACLE_SID = [+ASM2] ? The Oracle base remains unchanged with value /u01/app/grid [root@racnode2 22191676]# crsctl start crs
After a few moments, check the Clusterware software stack has re-started:
[root@racnode2 22191676]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
Next, shutdown Clusterware on the remote node (racnode1):
[root@racnode1 ~]# . oraenv ORACLE_SID = [RAC1DB1] ? +ASM1 The Oracle base has been changed from /u01/app/oracle to /u01/app/grid [root@racnode1 ~]# crsctl stop crs
Next, set your environment and run the opatchauto command:
[root@racnode2 ~]# . oraenv ORACLE_SID = [root] ? RAC1DB2 The Oracle base has been set to /u01/app/oracle [root@racnode2 ~]# export PATH=$PATH:$ORACLE_HOME/OPatch [root@racnode2 ~]# which opatchauto /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/opatchauto
Next, run the opatchauto command:
[root@racnode2 ~]# opatchauto apply /nas1/PATCHES/22191676/grid/22191676 -oh /u01/app/oracle/product/12.1.0/dbhome_1 -ocmrf /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/ocm/ocm.rsp
The screen output was the same as the opatchauto run on racnode1 and Patch #22139226 was successfully applied. Now we’ll run the SQL commands which were missed due to the RAC database not being up and running.
The first SQL script just checks for the existence of an SPFILE, so we’ll skip that.
The second SQL script alters the CLUSTER_DATABASE parameter FALSE in the SPFILE:
[oracle@racnode2 ~]$ srvctl start instance -d RAC1DB -n racnode2 [oracle@racnode2 ~]$ sqlplus / as sysdba SQL> @/u01/app/oracle/product/12.1.0/dbhome_1/cfgtoollogs/oplan/dataPatchfileForSetClusterFalse.sql System altered.
The third SQL script starts up the instance in upgrade mode:
[oracle@racnode2 ~]$ srvctl stop instance -d RAC1DB -n racnode2 [oracle@racnode2 ~]$ sqlplus / as sysdba Connected to an idle instance. SQL> @/u01/app/oracle/product/12.1.0/dbhome_1/cfgtoollogs/oplan/dataPatchfileForStartupUpgrade.sql ORACLE instance started. Total System Global Area 2147483648 bytes Fixed Size 2926472 bytes Variable Size 1207961720 bytes Database Buffers 922746880 bytes Redo Buffers 13848576 bytes Database mounted. Database opened.
The fourth script runs the datapatch script:
[oracle@racnode2 ~]$ /bin/bash -c 'ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 ORACLE_SID=RAC1DB2 /u01/app/oracle/product/12.1.0/dbhome_1/OPatch/datapatch' SQL Patching tool version 12.1.0.2.0 on Thu Mar 3 19:30:35 2016 Copyright (c) 2015, Oracle. All rights reserved. Connecting to database...OK Determining current state...done Adding patches to installation queue and performing prereq checks...done Installation queue: Nothing to roll back The following patches will be applied: 22139226 (Database PSU 12.1.0.2.160119, Oracle JavaVM Component (Jan2016)) Installing patches... Patch installation complete. Total patches installed: 1 Validating logfiles...done SQL Patching tool complete on Thu Mar 3 19:34:52 2016
The fifth SQL script alters the CLUSTER_DATABASE parameter TRUE in the SPFILE and shuts down the instance:
[oracle@racnode2 ~]$ sqlplus / as sysdba SQL> @/u01/app/oracle/product/12.1.0/dbhome_1/cfgtoollogs/oplan/dataPatchfileForSetClusterTrue.sql System altered. Database closed. Database dismounted. ORACLE instance shut down.
We can now re-start the Clusterware on racnode1:
[root@racnode1 ~]# crsctl start crs CRS-4123: Oracle High Availability Services has been started.
After waiting for a few moments, we can check the status of CRS:
[root@racnode1 ~]# crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
Finally, we can re-start the RAC database:
[oracle@racnode2 ~]$ srvctl start database -d RAC1DB [oracle@racnode2 ~]$ srvctl status database -d RAC1DB Instance RAC1DB1 is running on node racnode1 Instance RAC1DB2 is running on node racnode2
Note:
- Always read and understand the patch README file before doing anything else.
- The README file will not always tell you exactly what to do. Sometimes you have to use common sense and your DBA gene to figure stuff out.
- Always test applying a patch to a sandbox or test system for 3 main reasons:
- To develop your own procedure so running it in production becomes a mechanical process with little to no thinking required.
- To verify that it actually works.
- To determine how long it takes so you can schedule an appropriate production outage if necessary.
Task #4.8: Add a Node to the Cluster.
We have a 2 node RAC cluster called cluster1 comprised of servers racnode1 and racnode2. Now let’s expand that cluster to 3 nodes by adding racnode3. After starting up racnode3 using Oracle VM Manager and connecting to the console, there are multiple steps to prepare racnode3 to join the cluster. This mostly involves changes to the OS, networking and shared storage.
Task #4.8a: Install the Cloud Control Management Agent.
Let’s get this out of the way first. Installation instructions for the Management Agent can be found here.
Task #4.8b: Fix the Networking.
Remember that racnode3 is a clone of racnode1, so it’s networking will be identical to racnode1. We need to change that so racnode3 has its own networking setup. An example of how to do that is documented here.
Task #4.8c: Remove IPv6.
If you’re not using IPv6 addressing, its presence on the vNICs will only confuse matters. So it’s best to remove it. Instructions on how to do that can be found here.
Task #4.8d: Fix the Hostname.
When racnode3 powers up, you’ll notice it’s called racnode1. We need to change its identity to racnode3. The procedure to do that is documented here.
Task #4.8e: Edit /etc/hosts.
On racnode3, update the /etc/hosts file so it’s identical to the other RAC nodes. You know it makes sense. A copy of this file can be found here. We will be configuring shared storage on the Openfiler server a little later, so in preparation for that add this entry to the /etc/hosts file on openfiler:
200.200.20.13 racnode3-storage
Task #4.8f: Edit /etc/resolv.conf.
Copy the /etc/resolv.conf file from either of the other 2 RAC nodes. It should contain just these entries:
# Generated by NetworkManager search mynet.com nameserver 200.200.10.16
Task #4.8g: Edit /etc/security/limits.conf.
Ensure this file has these entries:
#################################### # for oracle user #################################### oracle soft nofile 8192 oracle hard nofile 65536 oracle soft nproc 2048 oracle hard nproc 16384 oracle soft stack 10240 oracle hard stack 32768 oracle soft core unlimited oracle hard core unlimited oracle hard memlock 5500631 #################################### # for grid user #################################### grid soft nofile 8192 grid hard nofile 65536 grid soft nproc 2048 grid hard nproc 16384 grid soft stack 10240 grid hard stack 32768 grid soft core unlimited grid hard core unlimited
Task #4.8h: Rename /etc/ntp.conf.
We’re using Oracle’s Cluster Time Synchronization Service rather than NTP to keep time within the cluster. However, the mere presence of this file can cause problems, so we need to rename it. Here are the steps to do that:
[root@racnode3 ~]# ls -l /etc/ntp.conf -rw-r--r--. 1 root root 1770 Oct 26 10:38 /etc/ntp.conf [root@racnode3 ~]# service ntpd status ntpd is stopped [root@racnode3 ~]# chkconfig ntpd off [root@racnode3 ~]# mv /etc/ntp.conf /etc/ntp.conf.ORIG
Task #4.8i: Configure User Equivalency.
User equivalency or passwordless SSH needs to be setup for the racnode3 users oracle and grid. Instructions for configuring SSH can be found here.
Task #4.8j: Configure Shared Storage Within Openfiler.
First, we need to add racnode3-storage to the Network Access Configuration section. Open up the Openfiler management console, click on System, then add the entry for racnode3-storage using these values:
Name | Network/Host | Netmask | Type |
---|---|---|---|
racnode3-storage | 200.200.20.13 | 255.255.255.255 | Share |
The screen should look similar to this:

Next, Click Volumes. The screen should change to this:

Then click iSCSI Targets over on the right hand side. The screen should change to this:

The following procedure should be repeated for iSCSI targets iqn.2006-01.com.openfiler:c1vg-vol01 through iqn.2006-01.com.openfiler:c1vg-vol10.
Use the Select iSCSI Target pull down menu to select the iqn.2006-01.com.openfiler:c1vg-vol01 (it should already be selected by default), then click the Change button. Click on Network ACL and use the Access pull down menu to select Allow for racnode3-storage. The screen should look similar to this:

Click the Update button, then click on Target Configuration and select the next iSCS target.
Task #4.8k: Configure Shared Storage at the OS Level.
First, verify the iSCSI client rpm is installed, then configure the iSCSI client to start on boot up of the node:
[root@racnode3 ~]# rpm -qa | grep iscsi iscsi-initiator-utils-6.2.0.873-14.0.1.el6.x86_64 [root@racnode3 ~]# chkconfig iscsid on [root@racnode3 ~]# chkconfig iscsi on
Next, discover the iSCSI Targets coming from the openfiler server:
[root@racnode3 ~]# iscsiadm -m discovery -t sendtargets -p openfiler-storage Starting iscsid: [ OK ] 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol10 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol10 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol09 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol09 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol08 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol08 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol07 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol07 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol06 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol06 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol05 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol05 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol04 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol04 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol03 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol03 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol02 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol02 200.200.20.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol01 200.200.10.6:3260,1 iqn.2006-01.com.openfiler:c1vg-vol01
Next, log into each of the iSCSI targets using the openfiler’s storage network address on eth1 (200.200.20.6):
[root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol01 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol01, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol01, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol02 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol02, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol02, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol03 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol03, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol03, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol04 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol04, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol04, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol05 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol05, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol05, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol06 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol06, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol06, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol07 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol07, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol07, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol08 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol08, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol08, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol09 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol09, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol09, portal: 200.200.20.6,3260] successful. [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol10 -p 200.200.20.6 -l Logging in to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol10, portal: 200.200.20.6,3260] (multiple) Login to [iface: default, target: iqn.2006-01.com.openfiler:c1vg-vol10, portal: 200.200.20.6,3260] successful.
Next, configure automatic iSCSI client login:
[root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol01 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol02 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol03 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol04 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol05 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol06 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol07 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol08 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol09 -p 200.200.20.6 --op update -n node.startup -v automatic [root@racnode3 ~]# iscsiadm -m node -T iqn.2006-01.com.openfiler:c1vg-vol10 -p 200.200.20.6 --op update -n node.startup -v automatic
Task #4.8l: Configure Shared Storage at the ASM Level.
First, check the ASM rpms are installed:
[root@racnode3 ~]# rpm -qa | grep oracleasm oracleasmlib-2.0.12-1.el6.x86_64 oracleasm-support-2.1.8-1.el6.x86_64
Next, configure Oracle ASM:
[root@racnode3 ~]# oracleasm configure -i Configuring the Oracle ASM library driver. This will configure the on-boot properties of the Oracle ASM library driver. The following questions will determine whether the driver is loaded on boot and what permissions it will have. The current values will be shown in brackets ('[]'). Hitting <ENTER> without typing an answer will keep that current value. Ctrl-C will abort. Default user to own the driver interface []: grid Default group to own the driver interface []: asmadmin Start Oracle ASM library driver on boot (y/n) [n]: y Scan for Oracle ASM disks on boot (y/n) [y]: y Writing Oracle ASM library driver configuration: done
Next, initialize Oracle ASM:
[root@racnode1 ~]# oracleasm init Creating /dev/oracleasm mount point: /dev/oracleasm Loading module "oracleasm": oracleasm Configuring "oracleasm" to use device physical block size Mounting ASMlib driver filesystem: /dev/oracleasm
Finally, instantiate the ASM Disks:
[root@racnode3 ~]# oracleasm scandisks Reloading disk partitions: done Cleaning any stale ASM disks... Scanning system for ASM disks... Instantiating disk "ASMDISK01" Instantiating disk "ASMDISK02" Instantiating disk "ASMDISK03" Instantiating disk "ASMDISK04" Instantiating disk "ASMDISK05" Instantiating disk "ASMDISK06" Instantiating disk "ASMDISK07" Instantiating disk "ASMDISK08" Instantiating disk "ASMDISK09" Instantiating disk "ASMDISK10"
Task #4.8m: Run the cluvfy Utility.
As the grid user, run the following cluvfy utility command from the original software installation node (racnode1):
[grid@racnode1 ~]$ cluvfy stage -pre nodeadd -n racnode3 -fixup -verbose
The output is very extensive and showed all the tests were passed except one. That being the test for swap space. We can safely ignore that:
Check: Swap space Node Name Available Required Status ------------ ------------------------ ------------------------ ---------- racnode3 4GB (4194300.0KB) 5.8306GB (6113860.0KB) failed racnode1 4GB (4194300.0KB) 5.8306GB (6113860.0KB) failed Result: Swap space check failed
Task #4.8n: Run the GI addnode.sh.
As the grid user on racnode1, set the GI environment and run the addnode.sh script:
[grid@racnode1 ~]$ . oraenv ORACLE_SID = [+ASM1] ? The Oracle base remains unchanged with value /u01/app/grid [grid@racnode1 ~]$ cd $ORACLE_HOME/addnode [grid@racnode1 addnode]$ pwd /u01/app/12.1.0/grid/addnode [grid@racnode1 ~]$ ./addnode.sh
This invokes the Cluster Add Node GUI:

Click the Add button and enter these values for the Public Hostname and Virtual Hostname:

Click OK, then click Next:

Predictably it complained about swap space:

Click the Ignore All check box, then click Next. Click Yes:

Click Install:

The installation starts and the familiar progress bar is displayed:

After a few minutes, the run root script box appears:

In a root windown on racnode3, run the root.sh script:
[root@racnode3 grid]# ./root.sh
The output can be viewed here.
When the root.sh script has completed, click OK:

Click Close and you’re done. Now let’s check that Grid Infrastructure really is up and running on racnode3.
First, run the post node add option of the cluvfy command:
[grid@racnode3 ~]$ cluvfy stage -post nodeadd -n racnode3
The output should not produce any errors.
Next, verify racnode3 is a member of the cluster:
[grid@racnode3 ~]$ olsnodes -n -i -s -t -a racnode1 1 racnode1-vip.mynet.com Active Hub Unpinned racnode2 2 racnode2-vip.mynet.com Active Hub Unpinned racnode3 3 racnode3-vip.mynet.com Active Hub Unpinned
Next, check CRS is up and running:
[grid@racnode3 ~]$ crsctl check crs CRS-4638: Oracle High Availability Services is online CRS-4537: Cluster Ready Services is online CRS-4529: Cluster Synchronization Services is online CRS-4533: Event Manager is online
Next, check the status of the SCAN VIPs:
[grid@racnode3 ~]$ srvctl status scan SCAN VIP scan1 is enabled SCAN VIP scan1 is running on node racnode1 SCAN VIP scan2 is enabled SCAN VIP scan2 is running on node racnode3 SCAN VIP scan3 is enabled SCAN VIP scan3 is running on node racnode2
Next, check the status of the nodeapps:
[grid@racnode3 ~]$ srvctl status nodeapps VIP racnode1-vip.mynet.com is enabled VIP racnode1-vip.mynet.com is running on node: racnode1 VIP racnode2-vip.mynet.com is enabled VIP racnode2-vip.mynet.com is running on node: racnode2 VIP racnode3-vip.mynet.com is enabled VIP racnode3-vip.mynet.com is running on node: racnode3 Network is enabled Network is running on node: racnode1 Network is running on node: racnode2 Network is running on node: racnode3 ONS is enabled ONS daemon is running on node: racnode1 ONS daemon is running on node: racnode2 ONS daemon is running on node: racnode3
Finally, check the status of ASM:
[grid@racnode3 ~]$ srvctl status asm ASM is running on racnode2,racnode3,racnode1
Task #4.8o: Run the Database addnode.sh
With racnode3 now officially a member of the cluster, we now need to make it a database server. This is done in two stages. The first is to clone the database software using an existing cluster node, followed by adding a RAC database instance to racnode3.
[oracle@racnode1 addnode]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/addnode [oracle@racnode1 addnode]$ ./addnode.sh "CLUSTER_NEW_NODES={racnode3}"
Note, a non-domain qualified hostname is sufficient, i.e. racnode3 rather than racnode3.mynet.com.
The addnode script invokes a database installer GUI. Click the checkbox for racnode3:

Click Next:

As usual, it’s complaining about swap space which we know we can ignore. Click the Ignore All checkbox, then click Next:

Click Yes:

On the Summary screen, click Install:

The database $ORACLE_HOME on racnode1 is cloned to racnode3:

Towards the end of the cloning process, you’ll be asked to run a root.sh script on racnode3:
[root@racnode3 dbhome_1]# pwd /u01/app/oracle/product/12.1.0/dbhome_1 [root@racnode3 dbhome_1]# ./root.sh Performing root user operation. The following environment variables are set as: ORACLE_OWNER= oracle ORACLE_HOME= /u01/app/oracle/product/12.1.0/dbhome_1 Enter the full pathname of the local bin directory: [/usr/local/bin]: The contents of "dbhome" have not changed. No need to overwrite. The contents of "oraenv" have not changed. No need to overwrite. The contents of "coraenv" have not changed. No need to overwrite. Entries will be added to the /etc/oratab file as needed by Database Configuration Assistant when a database is created Finished running generic part of root script. Now product-specific root actions will be performed.
Once the root.sh script completes, return to the GUI, click OK to confirm the completion of the root.sh script, then click Close:

Task #4.8p: Run the Database Configuration Assistant (dbca).
All that’s happened is the Oracle database software has been installed onto racnode3. To extend the RAC database to a third instance, we need to run the Database Configuration Assistant from a node where an instance is already up and running, e.g. racnode1:
[oracle@racnode1 ~]$ . oraenv ORACLE_SID = [RAC1DB1] ? The Oracle base remains unchanged with value /u01/app/oracle [oracle@racnode1 ~]$ which dbca /u01/app/oracle/product/12.1.0/dbhome_1/bin/dbca [oracle@racnode1 ~]$ dbca
Select Instance Management, then click Next:

Select Add an Instance, then click Next:

Select RAC1DB, then click Next:

Click Next:

Accept the default instance name of RAC1DB3 and click Next:

On the Summary screen, click Finish:

The GUI does its thing:

Finally, click Close:

We should now have a new instance up and running on racnode3. Let’s find out:
After adding a RAC1DB3 entry to the local /etc/oratab file:
[oracle@racnode3 dbhome_1]$ . oraenv ORACLE_SID = [oracle] ? RAC1DB3 The Oracle base has been set to /u01/app/oracle [oracle@racnode3 dbhome_1]$ sqlplus / as sysdba SQL> select instance_name from v$instance; INSTANCE_NAME ---------------- RAC1DB3
We know we’re using ASM, but we had no direct control over where the new instance’s files were created. So let’s make sure they exist in the correct place:
SQL> select lf.member from v$logfile lf where lf.group# in ( select l.group# from v$log l where l.thread# = 3 ); MEMBER -------------------------------------------------------------------------------- +DATA/RAC1DB/ONLINELOG/group_6.270.905859405 +FRA/RAC1DB/ONLINELOG/group_6.481.905859405 +DATA/RAC1DB/ONLINELOG/group_5.271.905859405 +FRA/RAC1DB/ONLINELOG/group_5.482.905859407 SQL> select file_name from dba_data_files where tablespace_name = 'UNDOTBS3'; FILE_NAME -------------------------------------------------------------------------------- +DATA/RAC1DB/DATAFILE/undotbs3.269.905859401
Finally, let’s make sure the cluster agrees the RAC1DB3 instance is part of our setup:
[oracle@racnode3 ~]$ srvctl status database -d RAC1DB Instance RAC1DB1 is running on node racnode1 Instance RAC1DB2 is running on node racnode2 Instance RAC1DB3 is running on node racnode3
And for those of you who are paranoid like myself, let’s just see the process running:
[oracle@racnode3 dbhome_1]$ ps -ef | grep -v grep | grep pmon oracle 17052 1 0 11:37 ? 00:00:00 ora_pmon_RAC1DB3 grid 23605 1 0 Mar05 ? 00:00:11 asm_pmon_+ASM3
Task #4.9: Delete a Node From the Cluster.
Note, before we dive into deleting the node we just added, there are a few items worth mentioning so we’re all on the same page:
- Our RAC database is admin-managed and we will be removing the instance and the Oracle Database software. The procedure is different for a policy-managed database.
- Before removing an instance, ensure any services associated with that instance are relocated to other instances.
- Ensure the services are configured such that the deleted instance is neither a preferred nor available instance.
Task #4.9a: Run the Database Configuration Assistant (dbca).
From a node other than the one which runs the instance you wish to delete, fire up dbca:
[oracle@racnode1 ~]$ . oraenv ORACLE_SID = [RAC1DB1] ? The Oracle base remains unchanged with value /u01/app/oracle [oracle@racnode1 ~]$ dbca
Select Instance Management, then click Next:

Select Delete an Instance, then click Next:

Select RAC1DB, then click Next:

Select RAC1DB3, then click Next:

Click Finish:

Click OK to proceed:

The familiar progress bar tracks progress, then rather bizarrely a dialog box appears with a red cross, the digit 1 and an OK button. Clicking the OK button closes the dbca. Question is, did the removal of the RAC1DB3 instance succeed? Let’s find out:
The instance isn’t running on racnode3:
[oracle@racnode3 ~]$ ps -ef | grep -v grep | grep pmon grid 23605 1 0 Mar05 ? 00:00:13 asm_pmon_+ASM3
Let’s check the configuration of the RAC database:
[oracle@racnode3 ~]$ srvctl config database -d RAC1DB Database unique name: RAC1DB Database name: RAC1DB Oracle home: /u01/app/oracle/product/12.1.0/dbhome_1 Oracle user: oracle Spfile: +DATA/RAC1DB/PARAMETERFILE/spfile.268.898784645 Password file: +DATA/RAC1DB/PASSWORD/pwdrac1db.256.898779707 Domain: mynet.com Start options: open Stop options: immediate Database role: PRIMARY Management policy: AUTOMATIC Server pools: Disk Groups: FRA,DATA Mount point paths: Services: Type: RAC Start concurrency: Stop concurrency: OSDBA group: dba OSOPER group: oper Database instances: RAC1DB1,RAC1DB2 Configured nodes: racnode1,racnode2 Database is administrator managed
The cluster doesn’t see the RAC1DB3 instance as a known resource:
[oracle@racnode3 ~]$ srvctl status database -d RAC1DB Instance RAC1DB1 is running on node racnode1 Instance RAC1DB2 is running on node racnode2
The Online Redo Logs allocated to the instance have been deleted from the database:
SQL> select lf.member from v$logfile lf where lf.group# in ( select l.group# from v$log l where l.thread# = 3 ); no rows selected
Let’s double check thread #3 has really gone:
SQL> select inst_id, group#, thread#, members from gv$log; INST_ID GROUP# THREAD# MEMBERS ---------- ---------- ---------- ---------- 1 1 1 2 1 2 1 2 1 3 2 2 1 4 2 2 2 1 1 2 2 2 1 2 2 3 2 2 2 4 2 2 8 rows selected.
The UNDO tablespace has also been removed:
SQL> select tablespace_name from dba_tablespaces where tablespace_name like 'UNDO%'; TABLESPACE_NAME ------------------------------ UNDOTBS1 UNDOTBS2
Looks like the instance RAC1DB3 has been deleted.
The Oracle Database software is still installed and the racnode3 node is still part of the cluster:
[oracle@racnode3 dbhome_1]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1 [grid@racnode3 ~]$ olsnodes -n -i -s -t -a racnode1 1 racnode1-vip.mynet.com Active Hub Unpinned racnode2 2 racnode2-vip.mynet.com Active Hub Unpinned racnode3 3 racnode3-vip.mynet.com Active Hub Unpinned
Task #4.9b: Deinstall the Oracle Database Software.
To update the Oracle Inventory on racnode3, run this:
[oracle@racnode3 bin]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/oui/bin [oracle@racnode3 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 "CLUSTER_NODES={racnode3}" -local Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 4044 MB Passed The inventory pointer is located at /etc/oraInst.loc 'UpdateNodeList' was successful.
To delete the database $ORACLE_HOME on racnode3, run this:
[oracle@racnode3 deinstall]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/deinstall [oracle@racnode3 deinstall]$ ./deinstall -local
The output is quite verbose and can be viewed here. There was one failure. The script could not delete the directory /u01/app/oracle because another session was in that directory. Lesson learned! Apart from that, the Oracle Database software was completely removed from racnode3.
Finally, update the Oracle Inventories on the remaining nodes by running this:
[oracle@racnode1 bin]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/oui/bin [oracle@racnode1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 "CLUSTER_NODES={racnode1,racnode2}" LOCAL_NODE=racnode1 Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 4032 MB Passed The inventory pointer is located at /etc/oraInst.loc 'UpdateNodeList' was successful.
The Oracle Database software installation has been deleted, but racnode3 is still seen as part of the cluster:
[oracle@racnode3 oracle]$ pwd /u01/app/oracle [oracle@racnode3 oracle]$ ls -la drwxrwxr-x 2 oracle oinstall 4096 Mar 7 21:09 . drwxr-xr-x 6 root oinstall 4096 Mar 4 10:59 .. [grid@racnode1 ~]$ olsnodes -n -i -s -t -a racnode1 1 racnode1-vip.mynet.com Active Hub Unpinned racnode2 2 racnode2-vip.mynet.com Active Hub Unpinned racnode3 3 racnode3-vip.mynet.com Active Hub Unpinned
Task #4.9c: Deinstall the Oracle Grid Infrastructure Software.
To update the Oracle Inventory on racnode3, run this:
[grid@racnode3 bin]$ pwd /u01/app/12.1.0/grid/oui/bin [grid@racnode3 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/12.1.0/grid "CLUSTER_NODES={racnode3}" CRS=TRUE -silent -local Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 4044 MB Passed The inventory pointer is located at /etc/oraInst.loc 'UpdateNodeList' was successful.
To delete the GI $ORACLE_HOME on racnode3, run the following command.
Note, this will shutdown Clusterware and ASM on racnode3 and relocate the LISTENER_SCANn to another node.
[grid@racnode3 ~]$ /u01/app/12.1.0/grid/deinstall/deinstall -local
The output is quite verbose and can be viewed here. In addition, part way through a generated script needs to be run as the root user. Its output can be viewed here.
To update the Oracle Inventories of the remaining nodes, run these commands:
[grid@racnode1 bin]$ pwd /u01/app/12.1.0/grid/oui/bin [grid@racnode1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/12.1.0/grid "CLUSTER_NODES={racnode1,racnode2}" CRS=TRUE -silent Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 4028 MB Passed The inventory pointer is located at /etc/oraInst.loc 'UpdateNodeList' was successful. [oracle@racnode1 bin]$ pwd /u01/app/oracle/product/12.1.0/dbhome_1/oui/bin [oracle@racnode1 bin]$ ./runInstaller -updateNodeList ORACLE_HOME=/u01/app/oracle/product/12.1.0/dbhome_1 "CLUSTER_NODES={racnode1,racnode2}" Starting Oracle Universal Installer... Checking swap space: must be greater than 500 MB. Actual 4028 MB Passed The inventory pointer is located at /etc/oraInst.loc 'UpdateNodeList' was successful. [root@racnode1 ~]# crsctl delete node -n racnode3 CRS-4661: Node racnode3 successfully deleted.
Finally, let’s verify racnode3 is no longer part of the cluster:
[grid@racnode1 ~]$ cluvfy stage -post nodedel -n racnode3 -verbose Performing post-checks for node removal Checking CRS integrity... The Oracle Clusterware is healthy on node "racnode1" CRS integrity check passed Clusterware version consistency passed. Result: Node removal check passed Post-check for node removal was successful. [grid@racnode1 ~]$ olsnodes -n -i -s -t -a racnode1 1 racnode1-vip.mynet.com Active Hub Unpinned racnode2 2 racnode2-vip.mynet.com Active Hub Unpinned
So there it is, racnode3 successfully removed from the cluster.
That about wraps it up for Part 11. In Part 12 we tackle Oracle Recovery Manager. You never know, you might want to back something up and restore it again. See you next time.
If you have any comments or questions about this post, please use the Contact form here.