Wednesday, October 01, 2008

Replacing Boot Disk which is running OpenSolaris

Recently my boot disk of my workstation which is running OpenSolaris 2008.05 started giving me media errors. Just for the record the errors looked like this:



 WARNING: /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0 (sd2):
     Error for Command: read(10)                Error Level: Retryable
       Requested Block: 4213122                   Error Block: 4213126
       Vendor: FUJITSU                            Serial Number: AAG0P5300991
       Sense Key: Media_Error
       ASC: 0x11 (read retries exhausted), ASCQ: 0x1, FRU: 0x0
 WARNING: /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0 (sd2):
    Error for Command: read(10)                Error Level: Retryable
       Requested Block: 4213122                   Error Block: 4213126
       Vendor: FUJITSU                            Serial Number: AAG0P5300991
       Sense Key: Media_Error
       ASC: 0x11 (read retries exhausted), ASCQ: 0x1, FRU: 0x0


I did not want to wait till the whole disk died so I started thinking about replacing the disk. I talked to our local lab manager to get a replacement disk and within couple of day he came back with a replacement disk. So with a replacement disk in my hand and the boot disk which is my primary drive in the system I went on a strategy to figure out how to change the disk without really re-installing the system.



First of all I am using an EOL product called Sun Java Workstation W2100z. Fortunately it has extra disk slots in it. Halting OpenSolaris 2008.05 I opened the system without a screw driver (it has twist screws) and I inserted my new disk into one of the available disk slot  and it already had ribbon cables and power supply for it. Booted up OpenSolaris and I could see the disks. Great now what.


My primary plan was to clone the disk exactly to be similar to my boot disk so here are my steps to make it look like it


 # format
Searching for disks...done


AVAILABLE DISK SELECTIONS:
       0. c5t0d0 <DEFAULT cyl 17879 alt 2 hd 255 sec 63>
          /pci@5,0/pci1022,7450@4/pci108e,534d@4/sd@0,0
       1. c6t0d0 <DEFAULT cyl 17846 alt 2 hd 255 sec 63>
          /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0
Specify disk (enter its number): 0
selecting c5t0d0
[disk formatted]
/dev/dsk/c5t0d0s0 is part of active ZFS pool rpool. Please see zpool(1M).
/dev/dsk/c5t0d0s1 is currently used by swap. Please see swap(1M).
format> par
partition> name
Enter table name (remember quotes): "myroot"
partition> pr
Current partition table (myroot):
Total disk cylinders available: 17879 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0       root    wm     262 - 17878      134.95GB    (17617/0/0) 283017105
  1       swap    wu       1 -   261        2.00GB    (261/0/0)     4192965
  2     backup    wu       0 - 17878      136.96GB    (17879/0/0) 287226135
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 unassigned    wm       0                0         (0/0/0)             0

partition> quit
format> disk
AVAILABLE DISK SELECTIONS:
       0. c5t0d0 <DEFAULT cyl 17879 alt 2 hd 255 sec 63>
          /pci@5,0/pci1022,7450@4/pci108e,534d@4/sd@0,0
       1. c6t0d0 <DEFAULT cyl 17846 alt 2 hd 255 sec 63>
          /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0
Specify disk (enter its number)[1]: 1
selecting c6t0d0
[disk formatted]
format> ver
WARNING - This disk may be in use by an application that has
      modified the fdisk table. Ensure that this disk is
      not currently in use before proceeding to use fdisk.
format> fdisk
No fdisk table exists. The default partition for the disk is:

  a 100% "SOLARIS System" partition

Type "y" to accept the default partition,  otherwise type "n" to edit the
 partition table.
y
format> par
partition> pr
Current partition table (original):
Total disk cylinders available: 17846 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       0                0         (0/0/0)             0
  1 unassigned    wm       0                0         (0/0/0)             0
  2     backup    wu       0 - 17845      136.71GB    (17846/0/0) 286695990
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 unassigned    wm       0                0         (0/0/0)             0

partition>


Unfortunately since somehow my cylinder counts were different I could not use select to choose the layout of "myroot" and hence had to set the layout manually to be similar to my boot disk.


partition> 0
Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm       0                0         (0/0/0)             0

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 262
Enter partition size[0b, 0c, 262e, 0.00mb, 0.00gb]: 17845e
partition> 1
Part      Tag    Flag     Cylinders         Size            Blocks
  1 unassigned    wm       0                0         (0/0/0)             0

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 1
Enter partition size[0b, 0c, 1e, 0.00mb, 0.00gb]: 261e
partition> pr
Current partition table (unnamed):
Total disk cylinders available: 17846 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
  0 unassigned    wm     262 - 17845      134.70GB    (17584/0/0) 282486960
  1 unassigned    wm       1 -   261        2.00GB    (261/0/0)     4192965
  2     backup    wu       0 - 17845      136.71GB    (17846/0/0) 286695990
  3 unassigned    wm       0                0         (0/0/0)             0
  4 unassigned    wm       0                0         (0/0/0)             0
  5 unassigned    wm       0                0         (0/0/0)             0
  6 unassigned    wm       0                0         (0/0/0)             0
  7 unassigned    wm       0                0         (0/0/0)             0
  8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
  9 unassigned    wm       0                0         (0/0/0)             0

partition> lab
Ready to label disk, continue? y

partition>



Now my disk layout looks like my boot disk image. Now I need to tell ZFS to move to the new disk. I wasn't comfortable doing all this in GUI so I reverted to the character mode by issuing the following command


# svcadm disable gdm


 In the character login mode,  I used the following step to move the ZFS rpool to the new disk


# zpool replace rpool c5t0d0s0 c6t0d0s0


(Sometimes you may have to use force flag if you have some some data already on c6t0d0s0


# zpool replace -f rpool c5t0d0s0 c6t0d0s0



The command did look like it returned quite quickly but it then I found it spawns a background task to start moving the pool to the new system.  I observed it by using


# zpool iostat -v


to see the IOs happening from the old disk to the new


and also


# zpool status -v


Once it successfully completed replacing the disk which could take a while, then I  made sure the swap location is also moved to the new disk in /etc/vfstab


# vi /etc/vfstab


(replace c5t0d0s1 to c6t0d0s1)

Now to make sure Grub is also available on the new disk and it knows where the new ZFS Root drive will be located:



# cd /boot/grub


# installgrub stage1 stage2 /dev/rdsk/c6t0d0s0


Final step before rebooting the operating system, I made sure the mini-root is updated with the new information


# bootadm update-archive


 And now I rebooted the system. Here is where there is a bit of timing issue since I need to press F2 to get into the boot menu and select the other disk to boot from in the CMOS menu. Booting into the other hard disk and seeing a working grub menu gave me a positive feeling that things are going right.  Selecting the OpenSolaris installation started booting Solaris again. Since I had disabled the GUI login, it took me to the login prompt. Login using my primary id and  then "su" into root shell


I checked using zpool status -v and "swap -l" to verify that rpool and swap are pointing to the new disk. Once I was satisfied, I  enabled GUI login as follows:


# svcadm enable gdm; exit


( Note as soon as you enable GUI via gdm it starts X windows and leave your shell hanging logged in. Hence I had  the extra ";exit" to  immediately exit after starting the GUI. )


Then I took the system down again to remove my dying disk from it and sent it back to my lab manager.






No comments: