
It’s acronym time boys and girls! Before we get to the fun part of configuring the shared storage in Part 9, we need setup a few more slightly less fun items.
Secure Shell or SSH and Domain Name System or DNS.
After that we take the thrill ride of running the Cluster Verification Utility or CVU for the first time. This stuff is pretty simple, so let’s crack on.
Quick links to all the tasks:
Task #1: Setup SSH.
The RAC node from where you install the Grid Infrastructure and Oracle Database software effectively becomes the node from which you perform ALL subsequent installs. Hence, the software only needs to reside on that one server and is copied from that server to all the other nodes in the cluster as part of the installation process. Consequently, this node needs remote access to the same user account on all the other nodes without being prompted for a password.
User equivalency is effectively the setup of identical user accounts on all nodes. Identical means the same username, UID, GIDs and password. The accounts which require user equivalency are oracle and grid.
The installation of the Grid Infrastructure and database software can setup SSH for you, but you’ll need SSH configured before you get to that stage in order for the CVU to weave its magic on remote nodes. That’s why we’re setting it up ahead of the software installs. There are 8 steps in total.
Task #1a: Login as oracle and create an ssh directory (all nodes).
[root@racnode1 ~]# su - oracle [oracle@racnode1 ~]$ pwd /home/oracle [oracle@racnode1 ~]$ mkdir .ssh [oracle@racnode1 ~]$ chmod 700 .ssh [oracle@racnode1 ~]$ ls -la | grep ssh drwx------ 2 oracle oinstall 4096 Dec 13 15:41 .ssh
*** Repeat for all nodes in the cluster. ***
Task #1b: Generate RSA Keys (all nodes).
[oracle@racnode1 ~]$ ssh-keygen -t rsa Generating public/private rsa key pair. Enter file in which to save the key (/home/oracle/.ssh/id_rsa): Enter passphrase (empty for no passphrase): <just press the Enter key> Enter same passphrase again: <just press the Enter key> Your identification has been saved in /home/oracle/.ssh/id_rsa. Your public key has been saved in /home/oracle/.ssh/id_rsa.pub. The key fingerprint is: 3a:46:e5:58:c8:c1:f9:de:63:4f:d7:e4:29:d9:aa:b7 oracle@racnode1.mynet.com The key's randomart image is: <snipped>
*** Repeat for all nodes in the cluster. ***
Task #1c: Generate DSA Keys (all nodes).
[oracle@racnode1 ~]$ ssh-keygen -t dsa Generating public/private dsa key pair. Enter file in which to save the key (/home/oracle/.ssh/id_dsa): Enter passphrase (empty for no passphrase): <just press the Enter key> Enter same passphrase again: <just press the Enter key> Your identification has been saved in /home/oracle/.ssh/id_dsa. Your public key has been saved in /home/oracle/.ssh/id_dsa.pub. The key fingerprint is: 32:40:a2:1f:e6:85:23:2f:80:e6:a0:15:9f:5e:01:9e oracle@racnode1.mynet.com The key's randomart image is: <snipped>
*** Repeat for all nodes in the cluster. ***
Task #1d: Create the authorized_keys file (all nodes).
[oracle@racnode1 ~]$ touch ~/.ssh/authorized_keys [oracle@racnode1 ~]$ ls -l .ssh -rw-r--r-- 1 oracle oinstall 0 Dec 13 15:55 authorized_keys -rw------- 1 oracle oinstall 668 Dec 13 15:47 id_dsa -rw-r--r-- 1 oracle oinstall 617 Dec 13 15:47 id_dsa.pub -rw------- 1 oracle oinstall 1675 Dec 13 15:46 id_rsa -rw-r--r-- 1 oracle oinstall 409 Dec 13 15:46 id_rsa.pub
*** Repeat for all nodes in the cluster. ***
Task #1e: Capture Key Fingerprints to the authorized_keys file (first node).
First, ssh into the node you’re logged into and capture the output to authorized_keys.
[oracle@racnode1 ~]$ cd .ssh [oracle@racnode1 .ssh]$ ssh racnode1 cat /home/oracle/.ssh/id_rsa.pub >> authorized_keys The authenticity of host 'racnode1 (200.200.10.11)' can't be established. RSA key fingerprint is 35:a5:78:68:36:c3:c2:42:f5:df:da:5f:2c:56:2b:a7. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'racnode1,200.200.10.11' (RSA) to the list of known hosts. oracle@racnode1's password: <enter the oracle password> [oracle@racnode1 .ssh]$ ssh racnode1 cat /home/oracle/.ssh/id_dsa.pub >> authorized_keys
*** Repeat these steps by using ssh to connect to every other node in the cluster, again capturing the output into authorized_keys. ***
Task #1f: Copy the authorized_keys file to every node.
The authorized_keys file on the first node now contains all the data which all the other nodes need. Use scp to copy it to the other nodes.
[oracle@racnode1 .ssh]$ scp authorized_keys racnode2:/home/oracle/.ssh oracle@racnode2's password: <enter the oracle password> authorized_keys 100% 2052 2.0KB/s 00:00
*** Repeat the scp to each node in the cluster. ***
Task #1g: Secure the authorized_keys file (all nodes).
[oracle@racnode1 .ssh]$ chmod 600 authorized_keys
*** Repeat for all nodes in the cluster. ***
Task #1h: Test passwordless connectivity both ways between all nodes.
[oracle@racnode1 .ssh]$ ssh racnode2 date Sun Dec 13 16:30:13 CST 2015
[oracle@racnode2 .ssh]$ ssh racnode1 date The authenticity of host 'racnode1 (200.200.10.11)' can't be established. RSA key fingerprint is 35:a5:78:68:36:c3:c2:42:f5:df:da:5f:2c:56:2b:a7. Are you sure you want to continue connecting (yes/no)? yes Warning: Permanently added 'racnode1,200.200.10.11' (RSA) to the list of known hosts. Sun Dec 13 16:30:32 CST 2015 [oracle@racnode2 .ssh]$ ssh racnode1 date Sun Dec 13 16:30:35 CST 2015
Once you have SSH setup (also known as “user equivalency”) for the oracle user, repeat the same 8 steps for the grid user.
Task #2: Configure DNS.
For the most part you can get away with using /etc/hosts files to resolve hostnames. However, with the introduction of Single Client Access Name (SCAN) in Oracle Database 11g Release 2, things got a little more complicated. A cluster SCAN must now resolve to 3 IP addresses, so a /etc/hosts files won’t help.
There is a ‘smoke and mirrors’ way to get a tweaked nslookup script to return 3 IP addresses, but why not just configure DNS properly? In a production environment, there would already be a dedicated DNS server, but in our environment using a whole VM just for DNS would be excessive to say the least. The smart way to go would be to use a server that’s used for something else and is always on. Top of the list is our Oracle Enterprise Manager Cloud Control server, oraemcc.mynet.com (IP: 200.200.10.16). It won’t mind and DNS won’t interfere with its other duties.
Whenever I’ve asked a production Network Administrator to set this up for me, I’ve sometimes (not always) got the eye roll and look of exasperation. Like I’m asking them to split the atom or something. In truth, setting up DNS isn’t that difficult, but it is fiddly. One semi colon, bracket or period out of place and the whole thing doesn’t work and more infuriatingly, won’t tell you why. So, with your keenest eye on the prize, let’s configure DNS using these 7 steps.
Task #2a: Install the bind package.
DNS functionality is provided by installing the bind package. This can be done using yum:
[root@oraemcc ~]# yum install bind-libs bind bind-utils
Task #2b: Edit /etc/named.conf.
The installation of bind will create the file /etc/named.conf. It will contain multiple lines, most of which you will not need to edit. Before you start to edit this file, take a copy of it in case you need to start over. You will also need to have several pieces of information to hand to correctly edit this and subsequent files. They are:
Information | Value |
---|---|
Hostname of your DNS Server | oraemcc.mynet.com |
IP Address of your DNS Server | 200.200.10.16 |
IP Address of your ISP’s DNS Servers | 8.8.8.8 & 8.8.4.4 |
Internal network domain name | mynet.com |
The items which need to be edited are shown in red in this example /etc/named.conf file. Copy this file to your own system and make any necessary changes. When you edit your file be VERY CAREFUL with the double quotes and periods. Missing something seemingly so innocent will make you wish you hadn’t. Three important points about this file.
- The forwarders line is there so server names which are not on your internal network can still be resolved. These servers would likely be on the internet, so your ISP’s DNS servers handle those lookups instead.
- The zone “10.200.200.in-addr.arpa.” line is there to facilitate reverse lookups. The 10.200.200 part describes the first three octets of the IP address your DNS server is on, but in reverse order. Our DNS server has the IP address 200.200.10.16, hence 10.200.200.
- The ownership and file permissions are crucial. Get these wrong and DNS won’t startup.
[root@oraemcc ~]# ls -l /etc/named.conf -rw-r----- 1 root named 1114 Aug 19 17:15 /etc/named.conf
Notice the file is owned by root, with named as the group. The file permissions are 640. If your file does not match this, then run these commands:
[root@oraemcc ~]# chgrp named /etc/named.conf [root@orasemcc ~]# chmod 640 /etc/named.conf
Task #2c: Create /var/named/mynet.com.zone file.
The next step is to create your /var/named/mynet.com.zone file which will resolve hostnames on your domain, mynet.com.
The items which need to be edited are shown in red in this example /var/named/mynet.com.zone file. Copy this file to your own system and make any necessary changes. Ensure this file has the correct group ownership and file permissions:
[root@oraemcc ~]# chgrp named /var/named/mynet.com.zone [root@oraemcc ~]# chmod 640 /var/named/mynet.com.zone [root@oraemcc ~]# ls -l /var/named/mynet.com.zone -rw-r----- 1 root named 1133 Dec 21 12:44 /var/named/mynet.com.zone
Task #2d: Create /var/named/10.200.200.in-addr.arpa file.
This step creates the /var/named/10.200.200.in-addr.arpa file which does reverse lookups for addresses on the 200.200.10.x network. It’s not strictly necessary to implement reverse lookups, but since we’re in the neighborhood we may as well.
The items which need to be edited are shown in red in this example /var/named/10.200.200.in-addr.arpa file. Copy this file to your own system and make any necessary changes. Again, ensure the file has the correct group and file permissions.
[root@oraemcc ~]# chgrp named /var/named/10.200.200.in-addr.arpa [root@oraemcc ~]# chmod 640 /var/named/10.200.200.in-addr.arpa [root@oraemcc ~]# ls -l /var/named/10.200.200.in-addr.arpa -rw-r----- 1 root named 916 Dec 21 12:26 10.200.200.in-addr.arpa
Task #2e: Edit /etc/resolv.conf.
Create or edit a /etc/resolv.conf file on each server which will use your DNS setup. These are the only lines which should be in this file:
search mynet.com nameserver 200.200.10.16
It is important that is file be identical across all the RAC server nodes as it’s one of the things the CVU checks for. This is the file which Network Manager trashes each time networking starts up on a server. So it’s important that this file remain intact and consistent.
Task #2f: Start the DNS service.
[root@oraemcc ~]# service named start Starting named: [ OK ]
Checking the status of DNS generates some interesting output:
[root@oraemcc ~]# service named status version: 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.5 CPUs found: 2 worker threads: 2 number of zones: 21 debug level: 0 xfers running: 0 xfers deferred: 0 soa queries in progress: 0 query logging is OFF recursive clients: 0/0/1000 tcp clients: 0/100 server is up and running named (pid 9931) is running...
Task #2g: Test DNS is working.
We have configured DNS and have started the service, so now’s the time to make sure it actually works.
First, test a hostname lookup:
[root@racnode1 ~]# nslookup racnode2 Server: 200.200.10.16 Address: 200.200.10.16#53 Name: racnode2.mynet.com Address: 200.200.10.12
Next, test the lookup of the cluster SCAN:
[root@racnode1 ~]# nslookup cluster1-scan Server: 200.200.10.16 Address: 200.200.10.16#53 Name: cluster1-scan.mynet.com Address: 200.200.10.122 Name: cluster1-scan.mynet.com Address: 200.200.10.120 Name: cluster1-scan.mynet.com Address: 200.200.10.121
Finally, test a reverse lookup on an IP address:
[root@racnode1 ~]# nslookup 200.200.10.120 Server: 200.200.10.16 Address: 200.200.10.16#53 120.10.200.200.in-addr.arpa name = cluster1-scan.mynet.com.
Bingo! It works! 🎆
Task #3: Run the CVU.
The Cluster Verification Utility (CVU) script, runcluvfy.sh, is included with the 12c Grid Infrastructure software. You should have already downloaded and unzipped these files. If not, refer to this prior step.
Login to racnode1 as the grid user, locate the CVU script, then run the script with the parameters shown:
[grid@racnode1 ~]$ cd ./media/gi_12102/grid [grid@racnode1 grid]$ ./runcluvfy.sh stage -pre crsinst -n racnode1,racnode2 -verbose 2>&1|tee cvu.out
This will run the CVU, echo its output to the screen and copy the output to a file called cvu.out.
The vast majority of the checks passed. Therefore, it only really makes sense to highlight the three which failed and more importantly what to do about them.
Failure #1: IPv6 Addressing.
The first error has to do with IPv6 addressing. Since we’re not using IPv6 (one day perhaps), we could ignore this error. However, IPv6 interface information will come up in the Grid Infrastructure installation which only serves to confuse matters. I therefore prefer to remove the IPv6 references. These are the errors:
Check: TCP connectivity of subnet "2002:4b49:1933::" Source Destination Connected? ------------------------------ ------------------------------ ---------------- racnode1 : 2002:4b49:1933:0:221:f6ff:fe04:4298 racnode1 : 2002:4b49:1933:0:221:f6ff:fe04:4298 passed racnode2 : 2002:4b49:1933:0:221:f6ff:fed2:45a0 racnode1 : 2002:4b49:1933:0:221:f6ff:fe04:4298 failed ERROR: PRVG-11850 : The system call "connect" failed with error "13" while executing exectask on node "racnode2" Permission denied racnode1 : 2002:4b49:1933:0:221:f6ff:fe04:4298 racnode2 : 2002:4b49:1933:0:221:f6ff:fed2:45a0 failed ERROR: PRVG-11850 : The system call "connect" failed with error "13" while executing exectask on node "racnode1" Permission denied racnode2 : 2002:4b49:1933:0:221:f6ff:fed2:45a0 racnode2 : 2002:4b49:1933:0:221:f6ff:fed2:45a0 passed Result: TCP connectivity check failed for subnet "2002:4b49:1933::"
You can confirm the presence of IPv6 addressing by checking the output of the ifconfig command. For example:
[root@racnode1 ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:21:F6:04:42:98 inet addr:200.200.10.11 Bcast:200.200.10.255 Mask:255.255.255.0 inet6 addr: fe80::221:f6ff:fe04:4298/64 Scope:Link inet6 addr: 2002:4b49:1933:0:221:f6ff:fe04:4298/64 Scope:Global UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:529808 errors:0 dropped:46361 overruns:0 frame:0 TX packets:43797 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:79241367 (75.5 MiB) TX bytes:18208947 (17.3 MiB)
The clues are the lines beginning with “inet6”. To disable IPv6 across the board, add the following line to the /etc/sysctl.conf file:
[root@racnode1 ~]# vi /etc/sysctl.conf # disable IPv6 support on all network interfaces: net.ipv6.conf.all.disable_ipv6 = 1
If you only wanted to disable IPv6 support for a specific interface, for example eth0, then the entry in /etc/sysctl.conf would look like this:
# disable IPv6 support on the eth0 network interfaces: net.ipv6.conf.eth0.disable_ipv6 = 1
To have this change take effect, either reboot or run this command:
[root@racnode1 ~]# sysctl -p /etc/sysctl.conf
To confirm the change has taken effect, re-run the ipconfig command:
[root@racnode1 ~]# ifconfig eth0 eth0 Link encap:Ethernet HWaddr 00:21:F6:04:42:98 inet addr:200.200.10.11 Bcast:200.200.10.255 Mask:255.255.255.0 UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:6481 errors:0 dropped:456 overruns:0 frame:0 TX packets:464 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:1277621 (1.2 MiB) TX bytes:92893 (90.7 KiB)
Failure #2: Network Time Protocol (NTP).
The second failure reported in the CVU output had to do with Network Time Protocol (NTP). You will recall from the installation of Oracle Linux on racnode1, we chose not to synchronize the date and time over the network. That’s because I prefer to use Oracle’s Cluster Time Synchronization Service instead. These are the NTP errors reported by the CVU:
Starting Clock synchronization checks using Network Time Protocol(NTP)... Checking existence of NTP configuration file "/etc/ntp.conf" across nodes Node Name File exists? ------------------------------------ ------------------------ racnode2 yes racnode1 yes The NTP configuration file "/etc/ntp.conf" is available on all nodes NTP configuration file "/etc/ntp.conf" existence check passed No NTP Daemons or Services were found to be running PRVF-5507 : NTP daemon or service is not running on any node but NTP configuration file exists on the following node(s): racnode2,racnode1 Result: Clock synchronization check using Network Time Protocol(NTP) failed
When Grid Infrastructure is installed, it will detect if NTP is running. If it is, then Cluster Time Synchronization Service is started in “observer mode”. If NTP is not running, then Cluster Time Synchronization Service is started in “active mode” and that’s what we want. However, if NTP is down but the installer sees the NTP configuration file /etc/ntp.conf, then it assumes NTP will spontaneously start by itself and the world will end as we know it. To avoid that from happening, we need to ensure NTP is down, stays down and can’t find its configuration file. We do that by renaming it:
[root@racnode1 ~]# service ntpd status ntpd is stopped [root@racnode1 ~]# chkconfig ntpd off [root@racnode1 ~]# mv /etc/ntp.conf /etc/ntp.conf.ORIG
Thus the world is saved. Hurrah! ☑
Failure #3: Insufficient Swap Space.
I originally built the racnodes with 4 GB of RAM, but things ran a bit slow so I upped the RAM to 6 GB. One of the great things about virtualization is the ability to add more resource to a VM through software. Assuming the resources are physically present and available.
Note, Oracle VM cannot over allocate physical server memory. You can only allocate what you physically have available in the Oracle VM Server.
Since I originally had 4 GB of RAM allocated, I also configured 4 GB of swap space. Increasing the amount of RAM means you should also increase the amount of swap space. Here’s the official Oracle sizing table:
RAM | Swap Space |
---|---|
Between 1 GB & 2 GB | 1.5x RAM |
Between 2 GB & 16 GB | Same size as RAM |
Greater than 16 GB | 16 GB |
Here is what the CVU reported regarding memory and swap space:
Check: Total memory Node Name Available Required Status ------------ ------------------------ ------------------------ ---------- racnode2 5.8369GB (6120432.0KB) 4GB (4194304.0KB) passed racnode1 5.8369GB (6120432.0KB) 4GB (4194304.0KB) passed Result: Total memory check passed Check: Swap space Node Name Available Required Status ------------ ------------------------ ------------------------ ---------- racnode2 4GB (4194300.0KB) 5.8369GB (6120432.0KB) failed racnode1 4GB (4194300.0KB) 5.8369GB (6120432.0KB) failed Result: Swap space check failed
There are a few ways to increase the amount of swap space on a Linux server. If you have sufficient free disk you can add an additional swap partition. If the current swap space is in a logical volume, you could potentially re-size it to increase the amount of swap space. The quickest and simplest way to increase the available swap space is to add a swap file. The commands to do that are here. Since this is a non-production system, swapping won’t be a problem so we can safely ignore this failure. If it were a production system, we’d obviously do the right thing….blame the Systems Administrator! 👺
Well, that about wraps it up for SSH, DNS and CVU.
See you next time for all things shared storage in Part 9.
If you have any comments or questions about this post, please use the Contact form here.