Build Your Own Oracle Infrastructure: Part 8 – SSH, DNS & CVU.

2

It’s acronym time boys and girls! Before we get to the fun part of configuring the shared storage in Part 9, we need setup a few more slightly less fun items.

Secure Shell or SSH and Domain Name System or DNS.

After that we take the thrill ride of running the Cluster Verification Utility or CVU for the first time. This stuff is pretty simple, so let’s crack on.

Quick links to all the tasks:

Task #1: Setup SSH.

The RAC node from where you install the Grid Infrastructure and Oracle Database software effectively becomes the node from which you perform ALL subsequent installs. Hence, the software only needs to reside on that one server and is copied from that server to all the other nodes in the cluster as part of the installation process. Consequently, this node needs remote access to the same user account on all the other nodes without being prompted for a password.

User equivalency is effectively the setup of identical user accounts on all nodes. Identical means the same username, UID, GIDs and password. The accounts which require user equivalency are oracle and grid.

The installation of the Grid Infrastructure and database software can setup SSH for you, but you’ll need SSH configured before you get to that stage in order for the CVU to weave its magic on remote nodes. That’s why we’re setting it up ahead of the software installs. There are 8 steps in total.

Task #1a: Login as oracle and create an ssh directory (all nodes).

[root@racnode1 ~]# su - oracle
[oracle@racnode1 ~]$ pwd
/home/oracle

[oracle@racnode1 ~]$ mkdir .ssh
[oracle@racnode1 ~]$ chmod 700 .ssh

[oracle@racnode1 ~]$ ls -la | grep ssh 
drwx------ 2 oracle oinstall 4096 Dec 13 15:41 .ssh  

*** Repeat for all nodes in the cluster. ***

Task #1b: Generate RSA Keys (all nodes).

[oracle@racnode1 ~]$ ssh-keygen -t rsa
Generating public/private rsa key pair.
Enter file in which to save the key (/home/oracle/.ssh/id_rsa):
Enter passphrase (empty for no passphrase): <just press the Enter key>
Enter same passphrase again: <just press the Enter key>
Your identification has been saved in /home/oracle/.ssh/id_rsa.
Your public key has been saved in /home/oracle/.ssh/id_rsa.pub.
The key fingerprint is:
3a:46:e5:58:c8:c1:f9:de:63:4f:d7:e4:29:d9:aa:b7 oracle@racnode1.mynet.com
The key's randomart image is:
<snipped>

*** Repeat for all nodes in the cluster. ***

Task #1c: Generate DSA Keys (all nodes).

[oracle@racnode1 ~]$ ssh-keygen -t dsa
Generating public/private dsa key pair.
Enter file in which to save the key (/home/oracle/.ssh/id_dsa):
Enter passphrase (empty for no passphrase): <just press the Enter key>
Enter same passphrase again: <just press the Enter key>
Your identification has been saved in /home/oracle/.ssh/id_dsa.
Your public key has been saved in /home/oracle/.ssh/id_dsa.pub.
The key fingerprint is:
32:40:a2:1f:e6:85:23:2f:80:e6:a0:15:9f:5e:01:9e oracle@racnode1.mynet.com
The key's randomart image is:
<snipped>

*** Repeat for all nodes in the cluster. ***

Task #1d: Create the authorized_keys file (all nodes).

[oracle@racnode1 ~]$ touch ~/.ssh/authorized_keys

[oracle@racnode1 ~]$ ls -l .ssh 
-rw-r--r-- 1 oracle oinstall    0 Dec 13 15:55  authorized_keys 
-rw------- 1 oracle oinstall  668 Dec 13 15:47 id_dsa  
-rw-r--r-- 1 oracle oinstall  617 Dec 13 15:47 id_dsa.pub 
-rw------- 1  oracle oinstall 1675 Dec 13 15:46 id_rsa 
-rw-r--r-- 1 oracle oinstall   409 Dec 13 15:46 id_rsa.pub  

*** Repeat for all nodes in the cluster. ***

Task #1e: Capture Key Fingerprints to the authorized_keys file (first node).

First, ssh into the node you’re logged into and capture the output to authorized_keys.

[oracle@racnode1 ~]$ cd .ssh

[oracle@racnode1 .ssh]$ ssh racnode1 cat /home/oracle/.ssh/id_rsa.pub >> authorized_keys  
The authenticity of host 'racnode1 (200.200.10.11)' can't be  established. 
RSA key fingerprint is  35:a5:78:68:36:c3:c2:42:f5:df:da:5f:2c:56:2b:a7. 
Are you sure you want  to continue connecting (yes/no)? yes 
Warning: Permanently added 'racnode1,200.200.10.11' (RSA) to the list of known hosts. 
oracle@racnode1's password: <enter the oracle password>

[oracle@racnode1 .ssh]$ ssh racnode1 cat /home/oracle/.ssh/id_dsa.pub >> authorized_keys 

*** Repeat these steps by using ssh to connect to every other node in the cluster, again capturing the output into authorized_keys. ***

Task #1f: Copy the authorized_keys file to every node.

The authorized_keys file on the first node now contains all the data which all the other nodes need. Use scp to copy it to the other nodes.

[oracle@racnode1 .ssh]$ scp authorized_keys racnode2:/home/oracle/.ssh
oracle@racnode2's password: <enter the oracle password>
authorized_keys                          100% 2052     2.0KB/s   00:00

*** Repeat the scp to each node in the cluster. ***

Task #1g: Secure the authorized_keys file (all nodes).

[oracle@racnode1 .ssh]$ chmod 600 authorized_keys

*** Repeat for all nodes in the cluster. ***

Task #1h: Test passwordless connectivity both ways between all nodes.

[oracle@racnode1 .ssh]$ ssh racnode2 date
Sun Dec 13 16:30:13 CST 2015
[oracle@racnode2 .ssh]$ ssh racnode1 date
The authenticity of host 'racnode1 (200.200.10.11)' can't be established.
RSA key fingerprint is 35:a5:78:68:36:c3:c2:42:f5:df:da:5f:2c:56:2b:a7.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'racnode1,200.200.10.11' (RSA) to the list of known hosts.
Sun Dec 13 16:30:32 CST 2015

[oracle@racnode2 .ssh]$ ssh racnode1 date
Sun Dec 13 16:30:35 CST 2015

Once you have SSH setup (also known as “user equivalency”) for the oracle user, repeat the same 8 steps for the grid user.

Task #2: Configure DNS.

For the most part you can get away with using /etc/hosts files to resolve hostnames. However, with the introduction of Single Client Access Name (SCAN) in Oracle Database 11g Release 2, things got a little more complicated. A cluster SCAN must now resolve to 3 IP addresses, so a /etc/hosts files won’t help.

There is a ‘smoke and mirrors’ way to get a tweaked nslookup script to return 3 IP addresses, but why not just configure DNS properly? In a production environment, there would already be a dedicated DNS server, but in our environment using a whole VM just for DNS would be excessive to say the least. The smart way to go would be to use a server that’s used for something else and is always on. Top of the list is our Oracle Enterprise Manager Cloud Control server, oraemcc.mynet.com (IP: 200.200.10.16). It won’t mind and DNS won’t interfere with its other duties.

Whenever I’ve asked a production Network Administrator to set this up for me, I’ve sometimes (not always) got the eye roll and look of exasperation. Like I’m asking them to split the atom or something. In truth, setting up DNS isn’t that difficult, but it is fiddly. One semi colon, bracket or period out of place and the whole thing doesn’t work and more infuriatingly, won’t tell you why. So, with your keenest eye on the prize, let’s configure DNS using these 7 steps.

Task #2a: Install the bind package.

DNS functionality is provided by installing the bind package. This can be done using yum:

[root@oraemcc ~]# yum install bind-libs bind bind-utils

Task #2b: Edit /etc/named.conf.

The installation of bind will create the file /etc/named.conf. It will contain multiple lines, most of which you will not need to edit. Before you start to edit this file, take a copy of it in case you need to start over. You will also need to have several pieces of information to hand to correctly edit this and subsequent files. They are:

Information Value
Hostname of your DNS Server oraemcc.mynet.com
IP Address of your DNS Server 200.200.10.16
IP Address of your ISP’s DNS Servers 8.8.8.8 & 8.8.4.4
Internal network domain name mynet.com

The items which need to be edited are shown in red in this example /etc/named.conf file. Copy this file to your own system and make any necessary changes. When you edit your file be VERY CAREFUL with the double quotes and periods. Missing something seemingly so innocent will make you wish you hadn’t. Three important points about this file.

  1. The forwarders line is there so server names which are not on your internal network can still be resolved. These servers would likely be on the internet, so your ISP’s DNS servers handle those lookups instead.
  2. The zone “10.200.200.in-addr.arpa.” line is there to facilitate reverse lookups. The 10.200.200 part describes the first three octets of the IP address your DNS server is on, but in reverse order. Our DNS server has the IP address 200.200.10.16, hence 10.200.200.
  3. The ownership and file permissions are crucial. Get these wrong and DNS won’t startup.
[root@oraemcc ~]# ls -l /etc/named.conf
-rw-r----- 1 root named 1114 Aug 19 17:15 /etc/named.conf

Notice the file is owned by root, with named as the group. The file permissions are 640. If your file does not match this, then run these commands:

[root@oraemcc ~]# chgrp named /etc/named.conf
[root@orasemcc ~]# chmod 640 /etc/named.conf

Task #2c: Create /var/named/mynet.com.zone file.

The next step is to create your /var/named/mynet.com.zone file which will resolve hostnames on your domain, mynet.com.

The items which need to be edited are shown in red in this example /var/named/mynet.com.zone file. Copy this file to your own system and make any necessary changes. Ensure this file has the correct group ownership and file permissions:

[root@oraemcc ~]# chgrp named /var/named/mynet.com.zone
[root@oraemcc ~]# chmod 640 /var/named/mynet.com.zone

[root@oraemcc ~]# ls -l /var/named/mynet.com.zone
-rw-r----- 1 root named 1133 Dec 21 12:44 /var/named/mynet.com.zone

Task #2d: Create /var/named/10.200.200.in-addr.arpa file.

This step creates the /var/named/10.200.200.in-addr.arpa file which does reverse lookups for addresses on the 200.200.10.x network. It’s not strictly necessary to implement reverse lookups, but since we’re in the neighborhood we may as well.

The items which need to be edited are shown in red in this example /var/named/10.200.200.in-addr.arpa file. Copy this file to your own system and make any necessary changes. Again, ensure the file has the correct group and file permissions.

[root@oraemcc ~]# chgrp named /var/named/10.200.200.in-addr.arpa
[root@oraemcc ~]# chmod 640 /var/named/10.200.200.in-addr.arpa

[root@oraemcc ~]# ls -l /var/named/10.200.200.in-addr.arpa 
-rw-r----- 1 root  named  916 Dec 21 12:26 10.200.200.in-addr.arpa  

Task #2e: Edit /etc/resolv.conf.

Create or edit a /etc/resolv.conf file on each server which will use your DNS setup. These are the only lines which should be in this file:

search mynet.com
nameserver 200.200.10.16

It is important that is file be identical across all the RAC server nodes as it’s one of the things the CVU checks for. This is the file which Network Manager trashes each time networking starts up on a server. So it’s important that this file remain intact and consistent.

Task #2f: Start the DNS service.

[root@oraemcc ~]# service named start
Starting named:                                            [  OK  ]

Checking the status of DNS generates some interesting output:

[root@oraemcc ~]# service named status
version: 9.8.2rc1-RedHat-9.8.2-0.37.rc1.el6_7.5
CPUs found: 2
worker threads: 2
number of zones: 21
debug level: 0
xfers running: 0
xfers deferred: 0
soa queries in progress: 0
query logging is OFF
recursive clients: 0/0/1000
tcp clients: 0/100
server is up and running
named (pid  9931) is running...

Task #2g: Test DNS is working.

We have configured DNS and have started the service, so now’s the time to make sure it actually works.

First, test a hostname lookup:

[root@racnode1 ~]# nslookup racnode2
Server:         200.200.10.16
Address:        200.200.10.16#53

Name:   racnode2.mynet.com
Address: 200.200.10.12

Next, test the lookup of the cluster SCAN:

[root@racnode1 ~]# nslookup cluster1-scan
Server:        200.200.10.16
Address:       200.200.10.16#53

Name:   cluster1-scan.mynet.com
Address: 200.200.10.122
Name:   cluster1-scan.mynet.com
Address: 200.200.10.120
Name:   cluster1-scan.mynet.com
Address: 200.200.10.121

Finally, test a reverse lookup on an IP address:

[root@racnode1 ~]# nslookup 200.200.10.120
Server:        200.200.10.16
Address:       200.200.10.16#53
120.10.200.200.in-addr.arpa     name = cluster1-scan.mynet.com.

Bingo! It works! ?

Task #3: Run the CVU.

The Cluster Verification Utility (CVU) script, runcluvfy.sh, is included with the 12c Grid Infrastructure software. You should have already downloaded and unzipped these files. If not, refer to this prior step.

Login to racnode1 as the grid user, locate the CVU script, then run the script with the parameters shown:

[grid@racnode1 ~]$ cd ./media/gi_12102/grid
[grid@racnode1 grid]$ ./runcluvfy.sh stage -pre crsinst -n racnode1,racnode2 -verbose 2>&1|tee cvu.out

This will run the CVU, echo its output to the screen and copy the output to a file called cvu.out.

The vast majority of the checks passed. Therefore, it only really makes sense to highlight the three which failed and more importantly what to do about them.

Failure #1: IPv6 Addressing.

The first error has to do with IPv6 addressing. Since we’re not using IPv6 (one day perhaps), we could ignore this error. However, IPv6 interface information will come up in the Grid Infrastructure installation which only serves to confuse matters. I therefore prefer to remove the IPv6 references. These are the errors:

Check: TCP connectivity of subnet "2002:4b49:1933::"

Source                          Destination                     Connected?
------------------------------ ------------------------------ ----------------
racnode1 : 2002:4b49:1933:0:221:f6ff:fe04:4298 racnode1 : 2002:4b49:1933:0:221:f6ff:fe04:4298 passed
racnode2 : 2002:4b49:1933:0:221:f6ff:fed2:45a0 racnode1 : 2002:4b49:1933:0:221:f6ff:fe04:4298 failed 
ERROR:
PRVG-11850 : The system call "connect" failed with error "13" while executing exectask on node 
"racnode2" Permission denied
racnode1 : 2002:4b49:1933:0:221:f6ff:fe04:4298 racnode2 : 2002:4b49:1933:0:221:f6ff:fed2:45a0 failed
ERROR:
PRVG-11850 : The system call "connect" failed with error "13" while executing exectask on node 
"racnode1" Permission denied
racnode2 : 2002:4b49:1933:0:221:f6ff:fed2:45a0 racnode2 : 2002:4b49:1933:0:221:f6ff:fed2:45a0 passed
Result: TCP connectivity check failed for subnet "2002:4b49:1933::"

You can confirm the presence of IPv6 addressing by checking the output of the ifconfig command. For example:

[root@racnode1 ~]# ifconfig eth0
eth0     Link encap:Ethernet HWaddr 00:21:F6:04:42:98
         inet addr:200.200.10.11 Bcast:200.200.10.255 Mask:255.255.255.0
         inet6 addr: fe80::221:f6ff:fe04:4298/64 Scope:Link
         inet6 addr: 2002:4b49:1933:0:221:f6ff:fe04:4298/64 Scope:Global
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:529808 errors:0 dropped:46361 overruns:0 frame:0
         TX packets:43797 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:79241367 (75.5 MiB) TX bytes:18208947 (17.3 MiB)

The clues are the lines beginning with “inet6”. To disable IPv6 across the board, add the following line to the /etc/sysctl.conf file:

[root@racnode1 ~]# vi /etc/sysctl.conf

# disable IPv6 support on all network interfaces:
net.ipv6.conf.all.disable_ipv6 = 1

If you only wanted to disable IPv6 support for a specific interface, for example eth0, then the entry in /etc/sysctl.conf would look like this:

# disable IPv6 support on the eth0 network interfaces:
net.ipv6.conf.eth0.disable_ipv6 = 1

To have this change take effect, either reboot or run this command:

[root@racnode1 ~]# sysctl -p /etc/sysctl.conf

To confirm the change has taken effect, re-run the ipconfig command:

[root@racnode1 ~]# ifconfig eth0
eth0     Link encap:Ethernet HWaddr 00:21:F6:04:42:98
         inet addr:200.200.10.11 Bcast:200.200.10.255 Mask:255.255.255.0
         UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
         RX packets:6481 errors:0 dropped:456 overruns:0 frame:0
         TX packets:464 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:1277621 (1.2 MiB) TX bytes:92893 (90.7 KiB)

Failure #2: Network Time Protocol (NTP).

The second failure reported in the CVU output had to do with Network Time Protocol (NTP). You will recall from the installation of Oracle Linux on racnode1, we chose not to synchronize the date and time over the network. That’s because I prefer to use Oracle’s Cluster Time Synchronization Service instead. These are the NTP errors reported by the CVU:

Starting Clock synchronization checks using Network Time Protocol(NTP)...
Checking existence of NTP configuration file "/etc/ntp.conf" across nodes
Node Name                             File exists?
------------------------------------ ------------------------
racnode2                             yes
racnode1                             yes

The NTP configuration file "/etc/ntp.conf" is available on all nodes
NTP configuration file "/etc/ntp.conf" existence check passed
No NTP Daemons or Services were found to be running
PRVF-5507 : NTP daemon or service is not running on any node but NTP 
configuration file exists on the following node(s):
racnode2,racnode1
Result: Clock synchronization check using Network Time Protocol(NTP) failed

When Grid Infrastructure is installed, it will detect if NTP is running. If it is, then Cluster Time Synchronization Service is started in “observer mode”. If NTP is not running, then Cluster Time Synchronization Service is started in “active mode” and that’s what we want. However, if NTP is down but the installer sees the NTP configuration file /etc/ntp.conf, then it assumes NTP will spontaneously start by itself and the world will end as we know it. To avoid that from happening, we need to ensure NTP is down, stays down and can’t find its configuration file. We do that by renaming it:

[root@racnode1 ~]# service ntpd status
ntpd is stopped

[root@racnode1 ~]# chkconfig ntpd off

[root@racnode1 ~]# mv /etc/ntp.conf /etc/ntp.conf.ORIG

Thus the world is saved. Hurrah! ☑

Failure #3: Insufficient Swap Space.

I originally built the racnodes with 4 GB of RAM, but things ran a bit slow so I upped the RAM to 6 GB. One of the great things about virtualization is the ability to add more resource to a VM through software. Assuming the resources are physically present and available.

Note, Oracle VM cannot over allocate physical server memory. You can only allocate what you physically have available in the Oracle VM Server.

Since I originally had 4 GB of RAM allocated, I also configured 4 GB of swap space. Increasing the amount of RAM means you should also increase the amount of swap space. Here’s the official Oracle sizing table:

RAM Swap Space
Between 1 GB & 2 GB 1.5x RAM
Between 2 GB & 16 GB Same size as RAM
Greater than 16 GB 16 GB

Here is what the CVU reported regarding memory and swap space:

Check: Total memory
Node Name     Available                 Required                 Status  
------------ ------------------------ ------------------------ ----------
racnode2     5.8369GB (6120432.0KB)   4GB (4194304.0KB)         passed  
racnode1     5.8369GB (6120432.0KB)   4GB (4194304.0KB)         passed  
Result: Total memory check passed

Check: Swap space
Node Name     Available                 Required                 Status  
------------ ------------------------ ------------------------ ----------
racnode2     4GB (4194300.0KB)         5.8369GB (6120432.0KB)   failed  
racnode1     4GB (4194300.0KB)         5.8369GB (6120432.0KB)   failed  
Result: Swap space check failed

There are a few ways to increase the amount of swap space on a Linux server. If you have sufficient free disk you can add an additional swap partition. If the current swap space is in a logical volume, you could potentially re-size it to increase the amount of swap space. The quickest and simplest way to increase the available swap space is to add a swap file. The commands to do that are here. Since this is a non-production system, swapping won’t be a problem so we can safely ignore this failure. If it were a production system, we’d obviously do the right thing….blame the Systems Administrator! ?

Well, that about wraps it up for SSH, DNS and CVU.

See you next time for all things shared storage in Part 9.

If you have any comments or questions about this post, please use the Contact form here.