Monday, July 8, 2013

ASM 12c New Feature Replace Command

We are going to test the new replace command for a simulated disk failure. In the test we will simulate that disk DG_DISK1A is bad after a bad plate and will be replaced with disk DG_DISK4A. Before 12c this would require to drop the disk from diskgroup and then add the new disk to the diskgroup. This would cause a complete rebalance of the diskgroup consuming time and resources.


Setup and Configure Disk

Added new VDI

disk dg_mirror4a.vdi

Partition New Disk

As root
#cd /dev
#fdisk sdu
sequence of answers "n", "p", "1" ,"Return", "Return", "p" and "w"

Configure Disk for ASM

AS root
#/usr/sbin/oracleasm createdisk DG_DISK4a /dev/sdu1

Scan ASM Disk

As root
#/usr/sbin/oracleasm scandisks

List ASM Disk

As root
#/usr/sbin/oracleasm listdisks

Output
[root@alpddbs002 dev]# /usr/sbin/oracleasm listdisks
DG_DISK1A
DG_DISK1B
DG_DISK2A
DG_DISK2B
DG_DISK3A
DG_DISK3B
DG_DISK4A
DISK1
DISK2
DISK3
DISK4
FRA1
FRA_DISK1A
FRA_DISK1B

Check ASM Disk

As root
#/etc/init.d/oracleasm querydisk -d `/etc/init.d/oracleasm listdisks -d` | \
cut -f2,10,11 -d" " | \
perl -pe 's/"(.*)".*\[(.*), *(.*)\]/$1 $2 $3/g;' | \
while read v_asmdisk v_minor v_major
do
v_device=`ls -la /dev | grep " $v_minor, *$v_major " | awk '{print $10}'`
echo "ASM disk $v_asmdisk based on /dev/$v_device [$v_minor, $v_major]"
done

Output
ASM disk DG_DISK1A based on /dev/sdm1 [8, 193]
ASM disk DG_DISK1B based on /dev/sdp1 [8, 241]
ASM disk DG_DISK2A based on /dev/sdn1 [8, 209]
ASM disk DG_DISK2B based on /dev/sdq1 [65, 1]
ASM disk DG_DISK3A based on /dev/sdo1 [8, 225]
ASM disk DG_DISK3B based on /dev/sdr1 [65, 17]
ASM disk DG_DISK4A based on /dev/sdu1 [65, 65]
ASM disk DISK1 based on /dev/sdg1 [8, 97]
ASM disk DISK2 based on /dev/sdh1 [8, 113]
ASM disk DISK3 based on /dev/sdi1 [8, 129]
ASM disk DISK4 based on /dev/sdj1 [8, 145]
ASM disk FRA1 based on /dev/sdk1 [8, 161]
ASM disk FRA_DISK1A based on /dev/sds1 [65, 33]
ASM disk FRA_DISK1B based on /dev/sdt1 [65, 49]

alpddbs002:{+ASM}:/home/oracle >asmcmd lsdg
State    Type    Rebal  Sector  Block       AU  Total_MB  Free_MB  Req_mir_free_MB  Usable_file_MB  Offline_disks  Voting_files  Name
MOUNTED  EXTERN  N         512   4096  1048576     32756    18438                0           18438              0             N  DG01/
MOUNTED  NORMAL  N         512   4096  1048576     61416    57686            10236           23725              0             N  DG_MIRROR/
MOUNTED  EXTERN  N         512   4096  1048576      8189     3294                0            3294              0             N  FRA01/
MOUNTED  NORMAL  N         512   4096  1048576     30716    29643                0           14821              0             N  FRA_MIRROR/

alpddbs002:{+ASM}:/home/oracle >asmcmd lsdsk -k -G dg_mirror
Total_MB  Free_MB  OS_MB  Name       Failgroup  Failgroup_Type  Library                                               Label      UDID  Product  Redund   Path
   10236     9617  10236  DG_DISK1A  FG_1       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK1A                 UNKNOWN  ORCL:DG_DISK1A
   10236     9613  10236  DG_DISK1B  FG_2       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK1B                 UNKNOWN  ORCL:DG_DISK1B
   10236     9616  10236  DG_DISK2A  FG_1       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK2A                 UNKNOWN  ORCL:DG_DISK2A
   10236     9614  10236  DG_DISK2B  FG_2       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK2B                 UNKNOWN  ORCL:DG_DISK2B
   10236     9610  10236  DG_DISK3A  FG_1       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK3A                 UNKNOWN  ORCL:DG_DISK3A
   10236     9616  10236  DG_DISK3B  FG_2       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK3B                 UNKNOWN  ORCL:DG_DISK3B

Test

1. Issue the below command to simulate failed device:


#echo 1 > /sys/block/sdm/device/delete

DB Alert Log shows the following when device failure:
Tue Jul 09 00:29:26 2013
WARNING: Read Failed. group:2 disk:3 AU:24 offset:16384 size:16384
path:ORCL:DG_DISK1A
         incarnation:0x7fff synchronous result:'I/O error'
         subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so krq:0x2af156b1c7b0 bufp:0x2af156b9ce00 osderr1:0x3 osderr2:0x2e
         IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: failed to read mirror side 1 of virtual extent 0 logical extent 0 of file 256 in group [2.3581463315] from disk DG_DISK1A  allocation unit 24 reason error; if possible, will try another mirror side
WARNING: group 2 file 256 vxn 0 block 1 read I/O failed
WARNING: Write Failed. group:2 disk:3 AU:24 offset:49152 size:16384
path:ORCL:DG_DISK1A
         incarnation:0x7fff asynchronous result:'I/O error'
         subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so krq:0x2af156b1c7b0 bufp:0x2af156835e00 osderr1:0x3 osderr2:0x2e
         IO elapsed time: 3000 usec Time waited on I/O: 0 usec
Tue Jul 09 00:29:27 2013
Errors in file /orabase/diag/rdbms/dbtest3/dbtest3/trace/dbtest3_ckpt_5347.trc:
ORA-15080: synchronous I/O operation failed to write block 3 of disk 3 in disk group DG_MIRROR
WARNING: failed to write mirror side 1 of virtual extent 0 logical extent 0 of file 256 in group 2 on disk 3 allocation unit 24
NOTE: process _ckpt_dbtest3 (5347) initiating offline of disk 3.32767 (DG_DISK1A) with mask 0x7e in group 2 (DG_MIRROR) with client assisting
Tue Jul 09 00:29:27 2013
NOTE: updating disk modes to 0x5 from 0x7 for disk 3 (DG_DISK1A) in group 2 (DG_MIRROR): lflags 0x0  
NOTE: disk 3 (DG_DISK1A) in group 2 (DG_MIRROR) is offline for reads
Tue Jul 09 00:29:27 2013
NOTE: ospid 5347 initiating cluster wide offline of disk 3 in group 2
Tue Jul 09 00:29:27 2013
NOTE: disk 3 (DG_DISK1A) in group 2 (DG_MIRROR) is locally offline for writes
Tue Jul 09 00:29:27 2013
NOTE: disk 3 (DG_DISK1A) in group 2 (DG_MIRROR) is offline for writes

ASM Alert Log shows the following when device failure:
Tue Jul 09 00:29:27 2013
WARNING: Write Failed. group:2 disk:3 AU:1 offset:1044480 size:4096
path:ORCL:DG_DISK1A
         incarnation:0xe9683bf3 asynchronous result:'I/O error'
         subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so krq:0x2adc483b0e28 bufp:0x2adc487f1600 osderr1:0x3 osderr2:0x2e
         IO elapsed time: 3000 usec Time waited on I/O: 0 usec
Tue Jul 09 00:29:27 2013
NOTE: process _user9905_+asm (9905) initiating offline of disk 3.3915922419 (DG_DISK1A) with mask 0x7e in group 2 (DG_MIRROR) with client assisting
NOTE: checking PST: grp = 2
Tue Jul 09 00:29:27 2013
GMON checking disk modes for group 2 at 35 for pid 31, osid 9905
Tue Jul 09 00:29:27 2013
NOTE: checking PST for grp 2 done.
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0xe9683bf3, mask = 0x6a, op = clear
Tue Jul 09 00:29:27 2013
GMON updating disk modes for group 2 at 36 for pid 31, osid 9905
WARNING: Write Failed. group:2 disk:3 AU:1 offset:1044480 size:4096
path:ORCL:DG_DISK1A
         incarnation:0xe9683bf3 synchronous result:'I/O error'
         subsys:/opt/oracle/extapi/64/asm/orcl/1/libasm.so krq:0x2adc487fc2c8 bufp:0x2adc48458c00 osderr1:0x3 osderr2:0x2e
         IO elapsed time: 0 usec Time waited on I/O: 0 usec
WARNING: found another non-responsive disk 3.3915922419 (DG_DISK1A) that will be offlined
NOTE: group DG_MIRROR: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG_MIRROR: updated PST location: disk 0004 (PST copy 1)
Tue Jul 09 00:29:27 2013
NOTE: PST update grp = 2 completed successfully
Tue Jul 09 00:29:27 2013
NOTE: process _b000_+asm (18830) initiating offline of disk 3.3915922419 (DG_DISK1A) with mask 0x7e in group 2 (DG_MIRROR) without client assisting
NOTE: checking PST: grp = 2
Tue Jul 09 00:29:27 2013
GMON checking disk modes for group 2 at 37 for pid 30, osid 18830
Tue Jul 09 00:29:27 2013
NOTE: checking PST for grp 2 done.
Tue Jul 09 00:29:27 2013
NOTE: sending set offline flag message (1112324913) to 1 disk(s) in group 2
Tue Jul 09 00:29:27 2013
WARNING: Disk 3 (DG_DISK1A) in group 2 mode 0x15 is now being offlined
Tue Jul 09 00:29:27 2013
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0xe9683bf3, mask = 0x6a, op = clear
Tue Jul 09 00:29:27 2013
GMON updating disk modes for group 2 at 38 for pid 30, osid 18830
Tue Jul 09 00:29:27 2013
NOTE: PST update grp = 2 completed successfully
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0xe9683bf3, mask = 0x7e, op = clear
Tue Jul 09 00:29:27 2013
GMON updating disk modes for group 2 at 39 for pid 30, osid 18830
NOTE: group DG_MIRROR: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG_MIRROR: updated PST location: disk 0004 (PST copy 1)
Tue Jul 09 00:29:27 2013
NOTE: cache closing disk 3 of grp 2: DG_DISK1A
Tue Jul 09 00:29:27 2013
NOTE: PST update grp = 2 completed successfully
Tue Jul 09 00:29:30 2013
WARNING: Hbeat write to PST disk 3.3915922419 in group 2 failed. [2]


2. Login to ASM as syasm

SQL>ALTER DISKGROUP DG_MIRROR REPLACE DISK DG_DISK1A WITH 'ORCL:DG_DISK4A'

DB Alert Log shows the following when replacing disk:
Tue Jul 09 00:34:29 2013
NOTE: Found ORCL:DG_DISK4A for disk DG_DISK1A
SUCCESS: disk DG_DISK1A (3.32767) replaced in diskgroup DG_MIRROR path: ORCL:DG_DISK4A
NOTE: updating disk modes to 0x5 from 0x1 for disk 3 (DG_DISK1A) in group 2 (DG_MIRROR): lflags 0x0  
NOTE: disk 3 (DG_DISK1A) in group 2 (DG_MIRROR) is online for writes
Tue Jul 09 00:35:40 2013
NOTE: updating disk modes to 0x7 from 0x5 for disk 3 (DG_DISK1A) in group 2 (DG_MIRROR): lflags 0x0  
NOTE: disk 3 (DG_DISK1A) in group 2 (DG_MIRROR) is online for reads

ASM Alert Log shows the following when replacing disk:
Tue Jul 09 00:34:29 2013
SQL> alter diskgroup DG_MIRROR replace disk DG_DISK1A with 'ORCL:DG_DISK4A'
Tue Jul 09 00:34:29 2013
NOTE: Found ORCL:DG_DISK4A for disk DG_DISK1A
NOTE: initiating resync of disk group 2 disks
DG_DISK1A (3)

NOTE: process _user19078_+asm (19078) initiating offline of disk 3.3915922419 (DG_DISK1A) with mask 0x7e in group 2 (DG_MIRROR) without client assisting
NOTE: checking PST: grp = 2
Tue Jul 09 00:34:29 2013
GMON checking disk modes for group 2 at 58 for pid 29, osid 19078
Tue Jul 09 00:34:29 2013
NOTE: checking PST for grp 2 done.
Tue Jul 09 00:34:29 2013
NOTE: sending set offline flag message (3778936899) to 1 disk(s) in group 2
Tue Jul 09 00:34:29 2013
WARNING: Disk 3 (DG_DISK1A) in group 2 mode 0x1 is now being offlined
Tue Jul 09 00:34:29 2013
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0xe9683bf3, mask = 0x6a, op = clear
Tue Jul 09 00:34:29 2013
GMON updating disk modes for group 2 at 59 for pid 29, osid 19078
Tue Jul 09 00:34:29 2013
NOTE: cache closing disk 3 of grp 2: (not open) DG_DISK1A label:DG_DISK4A
Tue Jul 09 00:34:29 2013
NOTE: PST update grp = 2 completed successfully
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0xe9683bf3, mask = 0x7e, op = clear
Tue Jul 09 00:34:29 2013
GMON updating disk modes for group 2 at 60 for pid 29, osid 19078
Tue Jul 09 00:34:29 2013
NOTE: cache closing disk 3 of grp 2: (not open) DG_DISK1A label:DG_DISK4A
Tue Jul 09 00:34:29 2013
NOTE: PST update grp = 2 completed successfully
NOTE: requesting all-instance membership refresh for group=2
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0x0, mask = 0x11, op = assign
Tue Jul 09 00:34:29 2013
GMON updating disk modes for group 2 at 61 for pid 29, osid 19078
Tue Jul 09 00:34:29 2013
NOTE: cache closing disk 3 of grp 2: (not open) DG_DISK1A label:DG_DISK4A
NOTE: group DG_MIRROR: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG_MIRROR: updated PST location: disk 0004 (PST copy 1)
Tue Jul 09 00:34:29 2013
NOTE: PST update grp = 2 completed successfully
NOTE: requesting all-instance disk validation for group=2
Tue Jul 09 00:34:29 2013
NOTE: disk validation pending for 1 disk in group 2/0xd578cb13 (DG_MIRROR)
NOTE: Found ORCL:DG_DISK4A for disk DG_DISK1A
NOTE: completed disk validation for 2/0xd578cb13 (DG_MIRROR)
Tue Jul 09 00:34:29 2013
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0x0, mask = 0x19, op = assign
Tue Jul 09 00:34:29 2013
GMON updating disk modes for group 2 at 62 for pid 29, osid 19078
NOTE: group DG_MIRROR: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG_MIRROR: updated PST location: disk 0004 (PST copy 1)
Tue Jul 09 00:34:29 2013
NOTE: PST update grp = 2 completed successfully
Tue Jul 09 00:34:29 2013
NOTE: membership refresh pending for group 2/0xd578cb13 (DG_MIRROR)
Tue Jul 09 00:34:29 2013
GMON querying group 2 at 63 for pid 14, osid 5150
NOTE: cache opening disk 3 of grp 2: DG_DISK1A label:DG_DISK4A
Tue Jul 09 00:34:29 2013
SUCCESS: refreshed membership for 2/0xd578cb13 (DG_MIRROR)
Tue Jul 09 00:34:29 2013
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0x0, mask = 0x5d, op = assign
Tue Jul 09 00:34:29 2013
GMON updating disk modes for group 2 at 64 for pid 29, osid 19078
NOTE: group DG_MIRROR: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG_MIRROR: updated PST location: disk 0004 (PST copy 1)
Tue Jul 09 00:34:29 2013
NOTE: PST update grp = 2 completed successfully
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0x0, mask = 0x7d, op = assign
Tue Jul 09 00:34:29 2013
GMON updating disk modes for group 2 at 65 for pid 29, osid 19078
NOTE: group DG_MIRROR: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG_MIRROR: updated PST location: disk 0004 (PST copy 1)
Tue Jul 09 00:34:29 2013
NOTE: PST update grp = 2 completed successfully
Tue Jul 09 00:34:29 2013
NOTE: Voting File refresh pending for group 2/0xd578cb13 (DG_MIRROR)
Tue Jul 09 00:34:29 2013
SUCCESS: alter diskgroup DG_MIRROR replace disk DG_DISK1A with 'ORCL:DG_DISK4A'
NOTE: Attempting voting file refresh on diskgroup DG_MIRROR
Tue Jul 09 00:34:30 2013
NOTE: starting rebalance of group 2/0xd578cb13 (DG_MIRROR) at power 1
Starting background process ARB0
Tue Jul 09 00:34:30 2013
ARB0 started with pid=33, OS id=19100
NOTE: assigning ARB0 to group 2/0xd578cb13 (DG_MIRROR) with 1 parallel I/O
Tue Jul 09 00:34:31 2013
NOTE: header on disk 0 advanced to format #2 using fcn 0.724
NOTE: header on disk 5 advanced to format #2 using fcn 0.724
Tue Jul 09 00:35:40 2013
NOTE: initiating PST update: grp 2 (DG_MIRROR), dsk = 3/0x0, mask = 0x7f, op = assign
Tue Jul 09 00:35:40 2013
GMON updating disk modes for group 2 at 84 for pid 33, osid 19100
NOTE: group DG_MIRROR: updated PST location: disk 0000 (PST copy 0)
NOTE: group DG_MIRROR: updated PST location: disk 0004 (PST copy 1)
Tue Jul 09 00:35:40 2013
NOTE: PST update grp = 2 completed successfully
NOTE: reset timers for disk: 3
NOTE: completed online of disk group 2 disks
DG_DISK1A (3)

Tue Jul 09 00:35:45 2013
NOTE: requesting all-instance membership refresh for group=2
Tue Jul 09 00:35:45 2013
NOTE: membership refresh pending for group 2/0xd578cb13 (DG_MIRROR)
Tue Jul 09 00:35:46 2013
GMON querying group 2 at 85 for pid 14, osid 5150
Tue Jul 09 00:35:46 2013
SUCCESS: refreshed membership for 2/0xd578cb13 (DG_MIRROR)
NOTE: stopping process ARB0
NOTE: Attempting voting file refresh on diskgroup DG_MIRROR
Tue Jul 09 00:35:48 2013
SUCCESS: rebalance completed for group 2/0xd578cb13 (DG_MIRROR)

How disk in the diskgroup show up after the replacement
alpddbs002:{+ASM}:/home/oracle >asmcmd lsdsk -k -G dg_mirror
Total_MB  Free_MB  OS_MB  Name       Failgroup  Failgroup_Type  Library                                               Label      UDID  Product  Redund   Path
   10236     9617  10236  DG_DISK1A  FG_1       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK1A                 UNKNOWN  ORCL:DG_DISK4A
   10236     9613  10236  DG_DISK1B  FG_2       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK1B                 UNKNOWN  ORCL:DG_DISK1B
   10236     9616  10236  DG_DISK2A  FG_1       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK2A                 UNKNOWN  ORCL:DG_DISK2A
   10236     9614  10236  DG_DISK2B  FG_2       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK2B                 UNKNOWN  ORCL:DG_DISK2B
   10236     9610  10236  DG_DISK3A  FG_1       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK3A                 UNKNOWN  ORCL:DG_DISK3A

   10236     9616  10236  DG_DISK3B  FG_2       REGULAR         ASM Library - Generic Linux, version 2.0.4 (KABI_V2)  DG_DISK3B                 UNKNOWN  ORCL:DG_DISK3B



From Enterprise Manager

Conclusion

The replace feature is more efficient and faster since this operation does not cause the diskgroup to rebalance. The replacement disk is populated with the exact same data as the disk being replaced. The data being written to the replacement disk is sourced from mirror copies separate across all disk in the diskgroup.

My webpages

http://db12c.blogspot.com/
http://cloudcontrol12c.blogspot.com/

http://www.youtube.com/user/jfruiz11375

Follow me on Twitter

References

12c Automatic Storage Management Administrator's Guide
Efficient Disk Replacement with ASM Release 12.1 by Peter Fusek

4 comments:

  1. Hi,

    Our 12c ASM instance crashed because of hard disk failure recently. One thing I don't understand is that they do have redundancy on SAN so would this not being transparent to ASM? Thanks for help in advance.

    ReplyDelete
    Replies
    1. What verison of ASM? Do you have normal redundancy on the diskgroups?

      Delete
  2. javier,

    We have 12c ASM and the disk group that failed are on OCR (Vote disks) which is on high redundancy.

    ReplyDelete
    Replies
    1. If that diskgroup is high redundancy then as long as two of the disk stay online then it should not have crashed. Check the alert log for ASM instance and check if maybe two of the devices went offline or had some type of issue. It is recommended that voting disk be from different arrays in the SAN to help with a possible array outage. You can even you two different San storage system and an NFS mount for the voting files that will give you the high redundancy from 3 physical location.

      Delete