How to migrate Oracle 10gR2 Clusterware and ASM storage devices on AIX

I haven't done a technical post in a while, so I thought I'd share some stuff I'd been working on recently.

It may sometimes be necessary to migrate storage devices being used by a database and Oracle Clusterware. This may be due to a SAN upgrade or simply moving between storage frames. AIX provides a native command called migratepv which can handle the movement of logical volume groups and filesystems. However, for Clusterware (OCR & Voting Disks), and ASM which reside on raw devices, a different method is required. The ocrconfig and crsctl commands are the preferred methods of working with OCR and voting disk files. Instructions on migrating AIX volume groups using migratepv is beyond the scope of this article but you can check out this article by Chris Gibson for additional details http://www.ibm.com/developerworks/aix/library/au-aixstorage/.

This document will describe the steps used to migrate the following types of devices attached to a two-node Oracle Database Server Enterprise Edition 10.2.0.4.0 (64-bit) RAC cluster with ASM on IBM AIX 5.3. All are raw devices residing on shared storage (SAN). It is assumed that the storage to which the database will be migrated has been allocated and presented to the servers with the correct permissions.

ASM disks
OCR disks
Voting disks
ASM spfile

Note: It is highly recommended to take backups of your OCR, voting disks and databases using ASM before making any changes.

Automatic Segment Management (ASM)

The benefits of using ASM for database storage becomes tangible in such a scenario i.e. ASM devices can be easily migrated online without downtime[1]. Once the new devices have been presented to the cluster nodes, all that remains is to add them to the existing disk groups and drop the old devices.

It is good practice to verify that the new devices are available on all nodes in the cluster before adding them to a disk group. If device aliases are used you can easily list the devices.

[oracle@oradba10t oracle]$ ls -lt /dev/asm*

crw-rw---- 1 oracle dba 40, 16 Aug 16 11:16 /dev/asm16

crw-rw---- 1 oracle dba 40, 18 Aug 16 11:16 /dev/asm18

crw-rw---- 1 oracle dba 40, 20 Aug 16 11:16 /dev/asm20

crw-rw---- 1 oracle dba 40, 19 Aug 15 13:00 /dev/asm19

crw-rw---- 1 oracle dba 40, 17 Aug 15 13:00 /dev/asm17

crw-rw---- 1 oracle dba 40, 22 Aug 15 13:00 /dev/asm22

crw-rw---- 1 oracle dba 40, 21 Aug 13 22:00 /dev/asm21

crw-rw---- 1 oracle dba 40, 11 Jul 27 13:11 /dev/asm11

crw-rw---- 1 oracle dba 40, 6 Jul 27 10:22 /dev/asm6

crw-rw---- 1 oracle dba 40, 7 Jul 27 10:19 /dev/asm7

crw-rw---- 1 oracle dba 40, 26 Jul 26 17:25 /dev/asmspf_disk1

crw-rw---- 1 oracle dba 40, 10 Jul 19 17:43 /dev/asm10

crw-rw---- 1 oracle dba 40, 14 Jul 12 16:50 /dev/asm14

crw-rw---- 1 oracle dba 40, 13 Jul 12 16:35 /dev/asm13

crw-rw---- 1 oracle dba 40, 12 Jul 12 15:59 /dev/asm12

crw-rw---- 1 oracle dba 40, 8 Jul 12 15:26 /dev/asm8

crw-rw---- 1 oracle dba 40, 9 Jul 08 23:02 /dev/asm9

crw-rw---- 1 oracle dba 40, 23 Jul 08 09:47 /dev/asm23

crw-rw---- 1 oracle dba 40, 24 Jul 08 09:47 /dev/asm24

crw-rw---- 1 oracle dba 40, 25 Jul 08 09:47 /dev/asm25

crw-rw---- 1 oracle dba 40, 5 Jun 17 14:18 /dev/asmspf_disk

crw-rw---- 1 oracle dba 40, 15 Jul 02 2009 /dev/asm15

You can call me paranoid, but I also like to re-check the reserve lock/policy settings on the devices. It should be set to no or off depending on the storage being used. I have had bad experiences with ASM devices that have their reserve locks set. The commands below show the device name, lun id, reserve lock setting and pvid settings for devices hdisks 16-25.

[oracle@oradba10t oracle]$ for i in 16 17 18 19 20 21 22 23 24 25

> do

> lsattr -El hdiskpower$i | grep reserve_lock | awk '{print $1,$2 }'| read rp1 rp2

> lsattr -El hdiskpower$i | grep pvid | awk '{print $1,$2 }'| read pv1 pv2

> lsattr -El hdiskpower$i | grep lun_id | awk '{print $1,$2 }'| read li1 li2

> if [ "$li1" != "" ]

> then

> echo hdiskpower$i' -> '$li1' = '$li2' / '$rp1' = '$rp2' / '$pv1' = '$pv2

> fi

> done

hdiskpower16 -> lun_id = 0x1000000000000 / reserve_lock = no / pvid = none

hdiskpower17 -> lun_id = 0x3000000000000 / reserve_lock = no / pvid = none

hdiskpower18 -> lun_id = 0x4000000000000 / reserve_lock = no / pvid = none

hdiskpower19 -> lun_id = 0x5000000000000 / reserve_lock = no / pvid = none

hdiskpower20 -> lun_id = 0x6000000000000 / reserve_lock = no / pvid = none

hdiskpower21 -> lun_id = 0x7000000000000 / reserve_lock = no / pvid = none

hdiskpower22 -> lun_id = 0x8000000000000 / reserve_lock = no / pvid = none

hdiskpower23 -> lun_id = 0x9000000000000 / reserve_lock = no / pvid = none

hdiskpower24 -> lun_id = 0xa000000000000 / reserve_lock = no / pvid = none

hdiskpower25 -> lun_id = 0xb000000000000 / reserve_lock = no / pvid = none

In the case of a migration there are typically numerous devices that need to be moved. Depending on your LUN sizes and the number of devices this could take a long time due to the amount of rebalancing that has to be done. In order to reduce the amount of rebalancing operations you can add all the disks with a rebalance power of 0 which effectively disables rebalancing.

SQL> alter diskgroup dg_data add disk '/dev/asm20','/dev/asm21','/dev/asm22' rebalance power 0;

Diskgroup altered.

SQL> select * from v$asm_operation;

no rows selected

Once the new disks have been added you can then drop the old disks except the last one with a rebalance power of 0. Once the disk is dropped with a rebalance power greater than 0, rebalancing will begin immediately.

Oracle Cluster Registry (OCR)

The OCR stores the Clusterware configuration. In a typical configuration the OCR is mirrored 2-way. Clusterware automatically creates backups of the OCR at periodic intervals. See the Oracle documentation for more details. http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm#CIHFBJEF

The ocrconfig command can be used to administer the OCR configuration, including adding/removing and replacing OCR disks and mirrors. The ocrconfig command should be run as root.

Check if there are recent backups of the OCR files.

[root@oradba10t /]$ /u01/app/oracle/product/10/crs/bin/ocrconfig -showbackup

oradba11t 2010/07/29 09:09:52 /u01/app/oracle/product/10/crs/cdata/crs10g

oradba11t 2010/07/29 05:09:51 /u01/app/oracle/product/10/crs/cdata/crs10g

oradba11t 2010/07/29 01:09:50 /u01/app/oracle/product/10/crs/cdata/crs10g

oradba11t 2010/07/27 17:09:45 /u01/app/oracle/product/10/crs/cdata/crs10g

oradba11t 2010/07/22 09:09:24 /u01/app/oracle/product/10/crs/cdata/crs10g

The showbackup command will list the hostname, path and backup date and time for each backup.

Note: All backups may not be on the same server.

[oracle@oradba11t dbs]$ ls -ltr /u01/app/oracle/product/10/crs/cdata/crs10g

total 84792

-rw-r--r-- 1 root system 6201344 Jul 22 09:09 week.ocr

-rw-r--r-- 1 root system 6201344 Jul 27 17:09 day.ocr

-rw-r--r-- 1 root system 6201344 Jul 28 17:09 day_.ocr

-rw-r--r-- 1 root system 6201344 Jul 29 05:09 backup02.ocr

-rw-r--r-- 1 root system 6201344 Jul 29 09:09 week_.ocr

-rw-r--r-- 1 root system 6201344 Jul 29 09:09 backup01.ocr

-rw-r--r-- 1 root system 6201344 Jul 29 13:09 backup00.ocr

If a recent OCR backup does not exist you should export the online OCR file.

[root@oradba10t /]$ /u01/app/oracle/product/10/crs/bin/ocrconfig -export /u02/backup/ocr_backup_07292010 -s online

[root@oradba10t /]$ ls -ltr /u02/backup/ocr*

-rw-r--r-- 1 root system 143260 Jul 29 13:29 /u02/backup/ocr_backup_07292010

Verify the integrity and status the current OCR files by using ocrcheck.

[root@oradba10t /]$ /u01/app/oracle/product/10/crs/bin/ocrcheck

Status of Oracle Cluster Registry is as follows :

Version : 2

Total space (kbytes) : 1055724

Used space (kbytes) : 5656

Available space (kbytes) : 1050068

ID : 2023180143

Device/File Name : /dev/ocr_disk1

Device/File integrity check succeeded

Device/File Name : /dev/ocr_disk2

Device/File integrity check succeeded

Cluster registry integrity check succeeded

Check if the new OCR devices are visible from all nodes. Using aliases makes this much simpler. If your devices don’t have aliases check with your system administrator for the correct device names.

[root@oradba10t /]$ ls -ltr /dev/ocr*

crw-r----- 1 oracle dba 40, 0 Jun 29 2009 /dev/ocr_disk1

crw-r----- 1 oracle dba 40, 1 Jun 29 2009 /dev/ocr_disk2

crw-r----- 1 root dba 40, 27 Jul 29 10:12 /dev/ocr_disk3

crw-r----- 1 root dba 40, 28 Jul 29 10:12 /dev/ocr_disk4

Check the permissions on the new devices are the same as the existing devices i.e. OCR should be owned by root and the dba group with 640 permissions. This should be the same on both nodes.

Go to the $CRS_HOME/bin directory and run ocrconfig from one node as root.

[root@oradba10t bin]$ ./ocrconfig -replace ocr /dev/ocr_disk3

[root@oradba10t bin]$ ./ocrconfig -replace ocrmirror /dev/ocr_disk4

Run ocrcheck on all nodes to verify the cluster integrity

[root@oradba10t bin]$ ./ocrcheck

Status of Oracle Cluster Registry is as follows :

Version : 2

Total space (kbytes) : 1055724

Used space (kbytes) : 5656

Available space (kbytes) : 1050068

ID : 2023180143

Device/File Name : /dev/ocr_disk3

Device/File integrity check succeeded

Device/File Name : /dev/ocr_disk4

Device/File integrity check succeeded

Cluster registry integrity check succeeded

Check the logs for errors under $CRS_HOME/log/hostname/client

ocrconfig generates log files similar to ocrconfig_xxxxxx.log

ocrcheck generates log files like ocrcheck_xxxxxx.log

Voting disks

Shutdown the database and stop CRS on both nodes before making modifications to the voting disks. It is also recommended to backup your voting disks.

[root@oradba10t /]$ /u01/app/oracle/product/10/crs/bin/crsctl stop crs

Stopping resources. This could take several minutes.

Successfully stopped CRS resources.

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

[root@oradba11t /]$ /u01/app/oracle/product/10/crs/bin/crsctl stop crs

Stopping resources. This could take several minutes.

Successfully stopped CRS resources.

Stopping CSSD.

Shutting down CSS daemon.

Shutdown request successfully issued.

Backup the voting disks. (This could take a long time).

[root@oradba10t client]$ dd if=/dev/voting_disk1 of=/u02/backup/voting_disk1_backup_07292010

2112000+0 records in.

2112000+0 records out.

[root@oradba10t /]$ dd if=/dev/voting_disk2 of=/u02/backup/voting_disk2_backup_07292010

2112000+0 records in.

2112000+0 records out.

[root@oradba10t /]$ dd if=/dev/voting_disk3 of=/u02/backup/voting_disk3_backup_07292010

2112000+0 records in.

2112000+0 records out.

Check your permissions on the new devices. Voting disks should be owned by oracle user and dba group with 644 permissions.

As root issue crsctl commands from one node.

[root@oradba10t /]$ cd /u01/app/oracle/product/10/crs/bin

[root@oradba10t bin]$ ./crsctl delete css votedisk /dev/voting_disk1 -force

successful deletion of votedisk /dev/voting_disk1.

[root@oradba10t bin]$ ./crsctl add css votedisk /dev/voting_disk4 -force

Now formatting voting disk: /dev/voting_disk4

successful addition of votedisk /dev/voting_disk4.

[root@oradba10t bin]$ ./crsctl delete css votedisk /dev/voting_disk2 -force

successful deletion of votedisk /dev/voting_disk2.

[root@oradba10t bin]$ ./crsctl delete css votedisk /dev/voting_disk3 -force

successful deletion of votedisk /dev/voting_disk3.

[root@oradba10t bin]$ ./crsctl add css votedisk /dev/voting_disk5 -force

Now formatting voting disk: /dev/voting_disk5

successful addition of votedisk /dev/voting_disk5.

[root@oradba10t bin]$ ./crsctl add css votedisk /dev/voting_disk6 -force

Now formatting voting disk: /dev/voting_disk6

successful addition of votedisk /dev/voting_disk6.

Check the status of the voting disks on both nodes using crsctl

[root@oradba10t bin]$ ./crsctl query css votedisk

0. 0 /dev/voting_disk4

1. 0 /dev/voting_disk5

2. 0 /dev/voting_disk6

located 3 votedisk(s).

[root@oradba11t /]$ /u01/app/oracle/product/10/crs/bin/crsctl query css votedisk

0. 0 /dev/voting_disk4

1. 0 /dev/voting_disk5

2. 0 /dev/voting_disk6

located 3 votedisk(s).

Start CRS on both nodes as root

[root@oradba10t bin]$ ./crsctl start crs

Attempting to start CRS stack

The CRS stack will be started shortly

Wait about 1 minute before checking the status. Sometimes it takes longer.

[root@oradba10t bin]$ ./crsctl check crs

CSS appears healthy

Cannot communicate with CRS

EVM appears healthy

If you encounter any issues you can check the cluster alert log under

$CRS_Home/log/oradba10t/alertnodename.log to make sure the voting disks are online

[cssd(1278060)]CRS-1605:CSSD voting file is online: /dev/voting_disk4. Details in /u01/app/oracle/product/10/crs/log/oradba10t/cssd/ocssd.log.

2010-07-29 16:38:26.173

[cssd(1278060)]CRS-1605:CSSD voting file is online: /dev/voting_disk5. Details in /u01/app/oracle/product/10/crs/log/oradba10t/cssd/ocssd.log.

2010-07-29 16:38:26.176

[cssd(1278060)]CRS-1605:CSSD voting file is online: /dev/voting_disk6. Details in /u01/app/oracle/product/10/crs/log/oradba10t/cssd/ocssd.log.

Repeat the procedure on other nodes and verify the status of the clusterware resources.

[root@oradba10t oradba10t]$ /u01/app/oracle/product/10/crs/bin/crs_stat -t

Name Type Target State Host

------------------------------------------------------------

ora....vice.cs application ONLINE ONLINE oradba10t

ora....ac1.srv application ONLINE ONLINE oradba10t

ora.labrac.db application ONLINE ONLINE oradba10t

ora....c1.inst application ONLINE ONLINE oradba10t

ora....c2.inst application ONLINE ONLINE oradba11t

ora....asic.cs application ONLINE ONLINE oradba10t

ora....ac1.srv application ONLINE ONLINE oradba10t

ora....ac2.srv application ONLINE ONLINE oradba11t

ora...._taf.cs application ONLINE ONLINE oradba10t

ora....ac1.srv application ONLINE ONLINE oradba10t

ora....SM1.asm application ONLINE ONLINE oradba10t

ora....0T.lsnr application ONLINE ONLINE oradba10t

ora....10t.gsd application ONLINE ONLINE oradba10t

ora....10t.ons application ONLINE ONLINE oradba10t

ora....10t.vip application ONLINE ONLINE oradba10t

ora....SM2.asm application ONLINE ONLINE oradba11t

ora....1T.lsnr application ONLINE ONLINE oradba11t

ora....11t.gsd application ONLINE ONLINE oradba11t

ora....11t.ons application ONLINE ONLINE oradba11t

ora....11t.vip application ONLINE ONLINE oradba11t

ASM Spfile

The ASM spfile can either be on local or shared device. If it is on local device then migratepv can be used to move it to new devices. If it’s the latter then you can move the spfile using the steps described below.

Each ASM instance in a cluster will have its own initialization parameter file located under $ORA_ASM_HOME/dbs. The contents of this file will only have one entry pointing to the spfile on the shared device.

oracle@tibora10d[+ASM1]-/u01/oracle/asm/dbs >cat init+ASM1.ora

SPFILE='/dev/asm_spfile1'

Export the ASM instance SID and ASM Oracle Home on each server.

export ORACLE_SID=+ASM1

export ORACLE_HOME=/u01/app/oracle/product/10.2.0/asm

Connect to your ASM instance and display the current spfile location

oracle@oradba10t oradba10t]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jul 26 15:31:39 2010

Connected to:

Oracle Database 10g Enterprise Edition Release 10.2.0.4.0 - 64bit Production

With the Partitioning, Real Application Clusters, OLAP, Data Mining

and Real Application Testing options

SQL> show parameter spfile

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

spfile string /dev/asmspf_disk

Check your new and existing ASM spfile devices to verify the correct permissions have been set.

[oracle@oradba10t dbs]$ ls -l /dev/asmspf*

crw-rw---- 1 oracle dba 40, 5 Jun 17 14:18 /dev/asmspf_disk

crw-rw---- 1 oracle dba 40, 26 Jul 08 09:53 /dev/asmspf_disk1

First create a pfile for your ASM instance on each node

SQL> create pfile='initASM1pfile.ora' from spfile;

Next shutdown your ASM instance and startup in nomount with pfile

[oracle@oradba10t dbs]$ srvctl stop asm -n oradba10t

[oracle@oradba10t dbs]$ srvctl stop asm -n oradba11t

SQL> startup nomount pfile='initASM1pfile.ora';

ASM instance started

Total System Global Area 130023424 bytes

Fixed Size 2082312 bytes

Variable Size 102775288 bytes

ASM Cache 25165824 bytes

From one node only create a spfile on the shared device

SQL> create spfile='/dev/asmspf_disk1' from pfile='initASM1pfile.ora';

File created.

Shutdown the ASM instance

SQL> shutdown immediate;

ORA-15100: invalid or missing diskgroup name

ASM instance shutdown

You can ignore the ORA-15100 error message. This is due to the instance being in nomount mode and the diskgroups are not mounted.

Go to you $ORA_ASM_HOME/dbs and edit your init+ASM.ora file to point to the new device.

[oracle@oradba10t dbs]$ vi init+ASM1.ora

"init+ASM1.ora" 1 line, 26 characters

SPFILE='/dev/asmspf_disk1'

Connect to your ASM instance using SQL Plus and start it up.

[oracle@oradba10t dbs]$ sqlplus / as sysdba

SQL*Plus: Release 10.2.0.4.0 - Production on Mon Jul 26 17:27:43 2010

Connected to an idle instance.

SQL> startup

ASM instance started

Total System Global Area 130023424 bytes

Fixed Size 2082312 bytes

Variable Size 102775288 bytes

ASM Cache 25165824 bytes

ASM diskgroups mounted

Verify that the spfile parameter

SQL> show parameter spfile

NAME TYPE VALUE

------------------------------------ ----------- ------------------------------

spfile string /dev/asmspf_disk1

SQL>

Once the ASM instances startup using the new spfile you can update the OCR ASM registry to have the new location for use with srvctl.

Shutdown the ASM instance on both nodes and modify the OCR ASM registry with the new spfile location.

srvctl modify asm -n oradba10t -i +ASM1 -o /u01/app/oracle/product/10/asm -p /dev/asmspf_disk1

srvctl modify asm -n oradba11t -i +ASM2 -o /u01/app/oracle/product/10/asm -p /dev/asmspf_disk1

Startup and check the status each ASM instance using srvctl.

[oracle@oradba10t dbs]$ srvctl start asm -n oradba10t

[oracle@oradba10t dbs]$ srvctl status asm -n oradba10t

ASM instance +ASM1 is running on node ocrradba10t.

[oracle@oradba10t dbs]$ srvctl start asm -n oradba11t

[oracle@oradba10t dbs]$ srvctl status asm -n oradba11t

ASM instance +ASM2 is running on node oradba11t.

References

Moving ASM spfile to a shared device in RAC [ID 973031.1]

OCR / Vote disk Maintenance Operations: (ADD/REMOVE/REPLACE/MOVE), including moving from RAW Devices to Block Devices. [ID 428681.1]

Oracle® Database Oracle Clusterware and Oracle Real Application Clusters Administration and Deployment Guide
10g Release 2 (10.2)

http://download.oracle.com/docs/cd/B19306_01/rac.102/b14197/votocr.htm#CIHFBJEF

[1] Downtime is still required for reclaiming devices due to bugs in 10.2.0.4 which causes the ASM file descriptors to remain open even after the devices are dropped from the disk groups. See bug 4693355, 7225720.

Search This Blog

Grid DBA

How to migrate Oracle 10gR2 Clusterware and ASM storage devices on AIX

Comments

Post a Comment

Popular posts from this blog

Setup a Wordpress site in 10 Minutes using Oracle Container Cloud Service

Using DML with ADRCI

Oracle Home Cloning: Tools and Techniques