Jignesh Shah's Blog: SunBlogs

Showing posts with label SunBlogs. Show all posts

Sunday, April 05, 2009

Effects of Flash/SSD on PostgreSQL - PostgreSQL East 2009

I presented a talk at PostgreSQL East 2009 at Drexel University today morning. The topic was "Effects of Flash/SSDs on PostgreSQL". Here are the slides from the presentation

If you have questions please leave comments.

Wednesday, March 11, 2009

Thundering Herd of Elephants for Scalability

During last year's PgCon 2008 I had presented about "Problems with PostgreSQL on Multi-core Systems". On slide 14 I talked about the results with IGEN with various think times and had identied the problem of how it is difficult to scale with increasing number of users. The chart showed of how 200ms think time tests will saturate about 1000 active users and then throughput starts to fall. On slide 16 & 17 I identified ProcArrayLock as the culprit of why scalability tanks with increasing number of users.

Today I was again intrigued by the same problem as I was trying out PostgreSQL 8.4 builds and once again hit the bottleneck at about 1000 users and frustrated that PostgreSQL cannot scale even with only 1 active socket (64 strands) of Sun Enterprise SPARC T5240 which is a 2 socket UltraSPARC T2 plus system.

Again I was digging through the source code of PostgreSQL 8.4 snapshot to see what can be done. While reviewing lwlock.c I thought of a change and quickly changed couple of lines in the code and recompiled the database binaries and re-ran the test. The results are as show below (8.4 Vs 8.4fix)

The setup was quite standard (as how I test them). The database and log files are on RAM (/tmp). The think times were about 200ms for each user between transactions. But the results was quite pleasing. On the same system setup, TPM throughput went up about 1.89x up and response time now shows a more gradual increase rather than a knee-jerk response.

So what was the change in the source code? Well before I explain what I did, let me explain what is PostgreSQL trying to do in that code logic.

After a process acquires a lock and does its critical work, when it is ready to release the lock, it finds the next postgres process that it needs to wake up so effectively not causing starvation for processes trying to do exclusive locks and also I think it is trying to avoid a Thundering Herd problem. It is a good thing to do for single core systems since CPU resources are sacred and effective usage results in better performane. However on systems with many CPU resources (say like 256 threads of Sun SPARC Enterprise T5440) this ends up artificially bottlenecking since it is not the system but the application determining which next process should wake up and try to get the lock.

The change I did was discard the selective process waiter wake-up and just wake up all waiters waiting for that lock and let the processes, OS Dispatcher, and CPU resources do its magic on its own way (and that worked, worked very well.)

//if (!proc->lwExclusive)
if (1)

{

while (proc->lwWaitLink != NULL &&

// !proc->lwWaitLink->lwExclusive)

proc = proc->lwWaitLink;

}

Its amazing that last year I had tried so many different approaches to the problem and a mere simple approach proved to be more effective.

I have put a proposal to the PostgreSQL Performance alias that a postgresql.conf tunable be defined for PostgreSQL 8.4 so people can tweak their instances to use the above method of awaking all waiters without impacting the existing behavior for most existing users.

In the meantime if you do compile your own PostgreSQL binaries, try the workaround if you are using PostgreSQL on ay 16 or more cores/threads system and provide feedback about the impact on your workloads.

Friday, March 06, 2009

Solaris 10 10/08 (Update 6) Update Manager

Ever since I had upgraded my workstation to Solaris 10 10/08 (Update 6) on my Sun Blade 2000 I had noticed that my patch update connection had not worked.Initially it would give me some weird Cocoa error or sometimes an empty dialog box with just OK in it.

Toda finally I was determined to fix it so I can apply the latest patches automatically instead of downloading them manually. Anyway the trick is to read these two Sun Forum entries.

Entry regarding the Solaris 10 patch that was released in January 2009

Entry regarding the correct Java version that is required by underlying smpatch.

In short in my case I had to install patch 121118-15 (for SPARC) and also remove references to JDK 1.6_10 in my /usr/jdk so that I can make my update manager work again on my desktop. Finally my useful update manager is now working on Solaris 10 10/08 to find and install recent patches to the system.

Thursday, March 05, 2009

PostgreSQL Conference East 2009 & upcoming 8.4 beta

Sometimes you need a kickstart to get one moving from inertia. Considering that I haven't blogged in 2009 at all, I welcome the PostgreSQL Conference East 2009 email reminders to get me going again. That will motivate me to do some targetted work with PostgreSQL for the community. I will be presenting on SSDs and PostgreSQL during the event. Unlike Robert Treat who can pretty much prepare all his slides during the lunch break of the conference, some of us require more time. As a reminder PostgreSQL East 2009 registration has already started.

I am also looking forward to PostgreSQL 8.4 beta and test drive it on 128-threaded Sun SPARC Enterprise T5240.

Tuesday, December 16, 2008

PostgreSQL 8.3 Appliance based on OpenSolaris using VirtualBox

Based on Alex Eremin's blog entry on minimal script and some of my own hacking, I have a script that should allow to create a basic PostgreSQL 8.3 Appliance using OpenSolaris 2008.11 Kernel in Virtualbox easily with an image which is not bloated (No X, Gnome, etc).

Here is how to try it out:

For the time being this requires the OpenSolaris 2008.11 CD image (650+ MB) only to execute my script create_pg_appliance.sh which is about 6KB. Bear with me till I solve this problem and play along :-)
Download Virtualbox and install it
Create a New Virtual Box VM with following parameters in the Wizard Screen

Call it PostgreSQL 8.3 Appliance
Select OS Type "OpenSolaris"
Base Memory 768MB or increase it to 1GB if you can spare your RAM for it
Create a New Dynamic Expanding Image with exactly 16.00 GB (Any other wise may not work)

Once the VM is created, immediately click the blue Network link and modify it to select a "Host Based Network" (in the setup make sure it is connected to the active host interface - wired or wireless depending on the host system)
Also Click the CD-ROM image and point it to the osol-200811.iso image that you downloaded earlier
Boot up the VM and select the first LiveCD option in the Grub Menu options
Select through the defaults to get the full Gnome Desktop
Open Firefox in the VM and make sure your VM has internet access
Open a terminal window and execute the following commands in sequence

wget "http://blogs.sun.com/jkshah/resource/create_pg_appliance.sh"
pfexec sh -x create_pg_appliance.sh
pfexec reboot

At this point stop the VM and disconnect the CD image connected to the VM and boot again or select the grub entry

Boot from Hard disk

Once the VM boots from the hard disk, select the grub menu item

PostgreSQL 8.3 Appliance based on OpenSolaris 2008.11

The Operating system boots up to give a login prompt. Login as root/opensolaris
The PostgreSQL database server is already running with PGDATA in /var/postgres/8.3/data
Modify postgresql.conf to add listen-address='*' parameter
Modify pg_hba .conf to allow your clients in pg_hba.conf
Restart the postgresql server instance as follows:

svcadm restart postgresql_83:default_32bit

Now the PostgreSQL Appliance is ready to be connected from your clients.
Remember to take snapshots of rpool/VOSApp/var so you can always revert back to earlier copies if you ever need it.

Note: In a trial installation on a MacBook Pro the script executed in about 11 minutes which includes the time it takes to download packages from pkg.opensolaris.org. I thought that was fantastic.

Maybe I should also create a minimized LiveCD just to execute the script on Virtualbox. If you have problems executing the script let me know.

Friday, December 12, 2008

Call for Papers for CommunityOne (New York) on March 18 2009

This year CommunityOne is being held also in New York (March 18, 2009) in addition to the CommunityOne held in San Francisco (June 1, 2009). CommunityOne is a conference sponsored by Sun Microsystems focused on
open source innovation and implementation. It brings together
developers, technologists and students for technical education and
exchange.

The Call for Paper is on and been extended to December 19. Since quite a bit of PostgreSQL crowd is in east coast maybe we should also put in some PostgreSQL proposals atleast for the New York event in March 2009. Any takers/suggestions?

Here is what that comes to my mind instantaneously :

I vote for a treat with Robert Treat 's pgsoltools presentation or his Database Antipatterns

Greg Smith' rerun of classic "Inside the PostgreSQL Shared Bufer Cache"

Others that I missed?

Tuesday, December 09, 2008

OpenSolaris 2008.11 and PostgreSQL 8.3

The second supported version of OpenSolaris called OpenSolaris 2008.11 is now officially launched. With the release, I guess now PostgreSQL 8.3 is now "officially" supported on OpenSolaris too (though you could have used it prior anyway with the development releases of OpenSolaris after the initial OpenSolaris 2008.05 was released).

Anyway let's quickly look at some tips and how-to which I think could be useful to improve the experience of PostgreSQL 8.3 with OpenSolaris 2008.11. One way to take a quick drive of OpenSolaris 2008.11 is to also try out the new VirtualBox 2.0.6 and install OpenSolaris 2008.11 in a VM. (Note: If you do plan to access the VM from other systems then use Host Interface instead of NAT to make it easy to access it from outside.)

PostgreSQL 8.3 is not already installed once you install OpenSolaris 2008.11 from the LiveCD. This gives an opportunity for a tweak that I would highly recommend. By default OpenSolaris 2008.11 does not have option for separate ZFS dataset for /var (like Solaris 10 10/08 - U6 does). I would certainly like my PostgreSQL database to be on a separate dataset that I can control PostgreSQL snapshots independently. Since I know the default PostgreSQL database path for PostgreSQL 8.3 on OpenSolaris is /var/postgres/8.3 I generally create a separate dataset as follows before I use pkg script to install PostgreSQL 8.3

# zfs create -o mountpoint=/var/postgres rpool/postgres

Now I am ready to install the various packages of PostgreSQL 8.3

# pkg install SUNWpostgr-83-server

# pkg install SUNWpostgr-83-client SUNWpostgr-jdbc SUNWpostgr-83-contrib

# pkg install SUNWpostgr-83-docs SUNWpostgr-83-devel

# pkg install SUNWpostgr-83-tcl SUNWpostgr-83-pl

Now to install PGAdmin3 as follows:

# pkg install SUNWpgadmin3

Even pgbouncer is available in the repository

# pkg install SUNWpgbouncer-pg83

Now that all binaries are installed, let's look on how to get started. Generally the best way to start and stop PostgreSQL 8.3 server is via the svcadm command of OpenSolaris. The SMF manifest for PostgreSQL 8.3 is installed with SUNWpostgr-83-server, however the import of the script does not happen till the next reboot. We can always work that around by doing a quick manual update as follows:

# svccfg import /var/svc/manifest/application/database/postgresql_83.xml

This creates two entries one for 32-bit server instance and one for 64-bit server instance.

# svcs -a |grep postgres
disabled 1:14:44 svc:/application/database/postgresql_83:default_64bit
disabled 1:14:45 svc:/application/database/postgresql_83:default_32bit

Depending on your server/choice you can start the corresponding server. Or 32-bit instance may be just easier to select if there are doubts.

# svcadm enable postgresql_83:default_32bit

# svcs -a |grep postgres
disabled 1:14:44 svc:/application/database/postgresql_83:default_64bit
online 2:12:37 svc:/application/database/postgresql_83:default_32bit

# zfs list rpool/postgres
NAME USED AVAIL REFER MOUNTPOINT
rpool/postgres 30.4M 11.8G 30.4M /var/postgres

The client psql is still not in the default path. The path /usr/postgres/8.3/bin should be in your search PATH. (or if you are using 64-bit add /usr/postgres/8.3/bin/64).

# /usr/postgres/8.3/bin/psql -U postgres postgres
Welcome to psql 8.3.4, the PostgreSQL interactive terminal.

Type: \copyright for distribution terms
       \h for help with SQL commands
       \? for help with psql commands
       \g or terminate with semicolon to execute query
       \q to quit

postgres=#

For people who are using PostgreSQL 8.2 on OpenSolaris 2008.05 already, there is a way to get to PostgreSQL 8.3. First you have to update your OpenSolaris 2008.05 to OpenSolaris 2008.11 image using

# pkg image-update

Then install PostgreSQL 8.3 binaries as above and also install an additional package

# pkg install SUNWpostgr-upgrade

If there are more question feel free to leave a comment.

Thursday, December 04, 2008

PostgreSQL and Netbeans 6.5

With the release of JavaFX 1.0, a new version of Netbeans 6.5 is now also available. Let's quickly look at its current PostgreSQL Support.

The JDBC driver for PostgreSQL is integrated in Netbeans 6.5. It is located at:

netbeans-6.5/ide10/modules/ext/postgresql-8.3-603.jdbc3.jar

As you can see it is pretty much upto sync to the latest (about a rev back since the latest is 604 build). Clicking the Services tab of Netbeans shows Database where a new entry can be created using using the "Direct URL Entry" using the format as follows:

jdbc:postgresql://hostname/databasename

providing the user name and password for the connection and then the schema name (hint: public schema) in the Advanced Tab and press OK to create a new connection to PostgreSQL database.

Once connected clicking the expand sign which looks like o- can be expanded to see the Tables/Views/Procedures and more expansions can be done on Tables to see data either using menu options or using the SQL Editor.

With the PostgreSQL setup in place now it should be easier to program with your favorite language be it Java or JavaFX , Python, PHP, Ruby, etc using PostgreSQL database as the backend.

( Note for myself: Try out Greg Smith's pgtune project which uses Python with the new Netbeans's Python editor.)

Tuesday, November 18, 2008

Quest for OpenSolaris based Appliance

Recently I burned a copy of OpenSolaris 2008.11 RC1 and used it with Songbird (using pkg install SUNWsongbird) and Fluendo MP3 Decoder (which is free for OpenSolaris), Flash and soon I had a setup in my bedroom which I love to call OpenSolaris Home Theatre Edition that I used to listen to my songs collection and watch online shows that I controlled via Remote Desktop which to me was a serious contender to those Media Centers out there.

However I realized that while I wanted to "Pump it up" for my personal usage, I really wanted to "Strip it Down" for business usage. What I meant is in order for someone to try it out with say another Open Source Software it is easier now to say try that Open Source Software pre-installed on OpenSolaris in a Virtualbox Image. However I found it very hard to do it in practice.

Say for example I want to get somebody to try out the latest version of MySQL on OpenSolaris. The easiest thing is to give them a VDI image of preinstalled version of MySQL 5.1 RC running on OpenSolaris that somebody just double clicks and boom a virutalbox instance starts up with MySQL ready to deliver.

However there is a problem with that theory. The VDI image of OpenSolaris fresh install in a virtualbox instance is about 2.5 - 3 GB. Of course adding MySQL on top of it won't drastically increase the size but I still have a problem with the base size itself. Since the only way that this can work is to use some sort of Peer to Peer File Sharing technology as hosting and hoping people will download this DVD Size downloads without any problems is like going at 10:00am for a Thanksgiving Deal at a store that opens at 6:00am with less than 20 SKU available (Tough Luck!!).

Anyway I started dissecting on how to reduce the size of the VDI image. There are few tricks provided by Virtualbox itself to release zero'ed out sectors using

VBoxManage modifyvdi VDINAME compact

However trying this out on a VDI holding OpenSolaris 2008.11 RC1 candidate did not really help.

The next thing I tried was to list out all the packages that are installed using

pkg list

There are about 550 packages installed as reported by that list. I tried removing many of the non-essential desktop things (Aha who needs Xorg, Gnome, Firefox, Thunderbird) and reduced it to less than 400 packages. However even with that the VDI images is still not much smaller than 2.5GB. I tried the trick of cloning the disk to another VDI via ZFS replace thinking it will get rid of all the freed up space obtained by removing all the packages but the resulting new VDI image was still 2.5GB.

Looking at ZFS list output I found that for my virtualbox instance with 512MB I have three main zfs file systems defined which gives me the approximate 2.5GB usage.

rpool/ROOT/opensolaris- The legacy mount which is root (/) which is about 1.3 GB

rpool/dump which is about 512MB

rpool/swap which is again about 512MB

Now the goal is to reduce the VDI size without creating complications for the end user trying out the VDI and also still have the capability of going back to the fully installed version of OpenSolaris.

Hence the quest still continues..

Wednesday, November 12, 2008

Snapshots with PostgreSQL and Sun Storage 7000 Unified Storage Systems

Continuing on my earlier blog entry on setup of PostgreSQL using Sun Storage 7000 Unified Storage (Amber Road), let's now look at how to take snapshots for such systems.

Considering a typical PostgreSQL system there are two ways to take backup:

Cold or Offline backup: Backing of PostgreSQL while PostgreSQL server is not running. The advantage of such a backup is that it is simple to achieve the purpose of backing up the database. However the disadvantage is that the database system is unavailable for work. The general steps for doing a cold backup is, shutdown gracefully the PostgreSQL server and backup all the files including $PGDATA, pg_xlog if it is in different location and all the tablespaces location used for that server instance. After backup the files, start the PostgreSQL server again and you are done. In case if you ever want to go back to that earlier backup version, you shutdown the database, restore all the old files and restart the database. This is essentially the same logic which we will follow with snapshots except if you now have a multi gigabyte database, the snapshot operation will look like million times faster than having to do a "tar" or "cpio" (specially if you consider that "tar" is a single threaded application doing its best but still slow if you use it on multi-core systems).

Lets consider an example on how to take a snapshot with OpenStorage systems with Cold or Offline backup strategy.

Assume that Sun Storage 7000 Unified Storage System has the hostname "amberroad" (Sun Storage 7000 Unified Storage Systems is mouthful ) have a project defined "postgres" which has three devices exported "pgdata" which holds $PGDATA, "pglog" which holds $PGDATA/pg_xlog and "pgtbs1" which is one of the tablespaces defined.

Then a backup/snapshot script (as postgres user ) will look something similar to the following (Disclaimer: Provided AS-IS. You will have to modify it for your own setups):

#!/bin/sh
date
echo "---------------------------------------------------------------"
echo "Shutting down the PostgreSQL database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Deleting earlier snapshot called testsnap.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres snapshots select testsnap destroy
echo "---------------------------------------------------------------"
echo "Taking a new snapshot..."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots snapshot testsnap
echo "---------------------------------------------------------------"
echo "Verifying the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots show
ssh -l root amberroad shares select postgres snapshots select testsnap show
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
date

Note: The testsnap should not exist otherwise snapshot operation will fail and hence we delete any earlier snaps called testsnap in the script. You may modify the script to suit your priorities.

Now in case you want to restore to the snapshot "testsnap" version
you can use a sample script as follows:

#!/bin/sh
date
echo "Restoring the database"
echo "---------------------------------------------------------------"
echo "Shutting down the database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Restoring the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres select pgdata snapshots \
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pglog snapshots \
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pgtbs1 snapshots \
select testsnap rollback
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
echo "Database restored to earlier snapshot: testsnap."
echo "---------------------------------------------------------------"
date

NOTE: When
you use the above script, all snapshots taken after “testsnap”
are lost as they are rolled back to the point in time when “testsnap”
was taken. There is another feature to clone the snapshot incase if you don't want to use rollback.

Hot or Online backup: A Hot or Online backup is when one takes a backup while PostgreSQL server is running. The advantage is that the database server is available. The disadvantage is that the setup is more complex than the Cold or Offline backup strategy. The recommended way to use Hot or Online backup with PostgreSQL is to use it in conjuction with PITR - Point in Time Recovery feature of PostgreSQL.

This can be achieved by turning on continuous WAL archiving in PostgreSQL and using SELECT pg_start_backup('label');before taking the snapshot and then
issuing SELECT pg_stop_backup();
after the snapshot is completed.

For this you will also need another dedicated device under the "postgres" project to just hold the WAL archive files. (In my case I call it "pgwal" and mounted as /pgwal/archive

To turn on continuous WAL archiving you need to set the following variables in postgresql.conf (and then restart the PostgreSQL server):

archive_mode = true
archive_command = 'test ! -f /var/lib/postgres/8.2/data/backup_in_progress || cp -i %p /pgwal/archive/%f < /dev/null'
archive_timeout=3600

Then a sample backup/snapshot with hot backup script can look as follows:

#!/bin/sh
#PGDATA= /var/postgres/8.3/data; export PGDATA # if not set 
date
echo "---------------------------------------------------------------"
echo "Indicating PostgreSQL for a hot Snapshot.."
echo "---------------------------------------------------------------"
touch $PGDATA/backup_in_progress
psql  -c “select pg_start_backup('MyHotSnapShot');” postgres
echo "---------------------------------------------------------------"
echo "Deleting earlier snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres snapshots select testsnap destroy
echo "---------------------------------------------------------------"
echo "Taking a new snapshot..."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots snapshot testsnap
echo "---------------------------------------------------------------"
echo "Verifying the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad shares select postgres snapshots show
ssh -l root amberroad shares select postgres snapshots select testsnap show
echo "---------------------------------------------------------------"
echo "Indicating to PostgreSQL Server that Hot Snapshot is done."
echo "---------------------------------------------------------------"
psql  -c “ select pg_stop_backup();” postgres
rm $PGDATA/backup_in_progress
echo "---------------------------------------------------------------"
date

Restoring from a hot backup snapshot is also bit tricky since we need to decide whether we need to roll forward and upto what point. It is probably beyond the scope of my post and hence I will just do a simple one with all files in the wal arhcives.

First of all restoring with PITR requires a recovery.conf file which should contain the path to the WAL Archives:

restore_command = 'cp /pgwal/archive/%f "%p"'

Also since we will be doing rollforward with the archive files, we need to remove all the old pg_xlog files, create an archive status directory, remove the postmaster.pid file and an artifact of our earlier script, remove the backup_in_progress marker file too.

(We will do the above steps also as part of our restore script.)

A restore snapshot script file can look something like as follows:

#!/bin/sh
#PGDATA= /var/postgres/8.3/data; export PGDATA # if not set 
date
echo "Restoring the database"
echo "---------------------------------------------------------------"
echo "Shutting down the database..."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA stop
echo "---------------------------------------------------------------"
echo "Restoring the snapshot.."
echo "---------------------------------------------------------------"
ssh -l root amberroad confirm shares select postgres select pgdata snapshots \
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pglog snapshots \
select testsnap rollback
ssh -l root amberroad confirm shares select postgres select pgtbs1 snapshots \
select testsnap rollback
#if there is reason to rollback walarchives then uncomment the next two lines
#ssh -l root amberroad confirm shares select postgres select pgwal snapshots \
#select testsnap rollback
echo "---------------------------------------------------------------"
echo "Make sure pg_xlog/* is empty ..."
echo "Make sure pg_xlog/archive_status exists..."
echo "Make sure postmaster.pid is removed"
echo "---------------------------------------------------------------"
rm $PGDATA/backup_in_progress
rm $PGDATA/postmaster.pid
rm $PGDATA/pg_xlog/*
mkdir -p $PGDATA/pg_xlog/archive_status
echo "---------------------------------------------------------------"
echo "Set recovery.conf for recover.."
echo "---------------------------------------------------------------"
echo restore_command = \'cp /pgwal/archive/%f "%p"\' > $PGDATA/recovery.conf
echo "---------------------------------------------------------------"
echo "Restarting the database.."
echo "---------------------------------------------------------------"
pg_ctl -D $PGDATA start -l $PGDATA/server.log
echo "---------------------------------------------------------------"
echo "Database restored."
echo "---------------------------------------------------------------"
date

Of course the above scripts may not be best but gives an idea of how one can use snapshot features provided by Sun Storage 7000 Unified Storage Systems with PostgreSQL.

Monday, November 10, 2008

8x Quad-Core Opteron PostgreSQL/MySQL Server Try and Buy Offer

Somehow I must have missed this try and buy offer before.

Evaluate the Sun Fire X4600 with PostgreSQL, MySQL or Microsoft SQL Server 2005. The Sun Fire X4600 M2 Server included in the program is available in two configurations. First configuration is 4 x Quad-Core with 16GB RAM and second configuration is with 8x Quad-Core (yes 32 cores in all) with 64GB RAM. Of course it only comes with two disks in it which means also need to "Try and Buy" the Sun StorageTek 2540 Array or the Sun Storage 7110 Unified Storage System with it.

Would love to see some real life deployment saturate 32-core Opteron system using only PostgreSQL or MySQL as the database system in it.

PostgreSQL and Project Amber Road (Sun Storage 7000 Series)

The New Sun Microsystems announced a new line of OpenStorage devices. The new Sun Storage 7000 Unified Storage Systems (code name Amber Road) are Storage appliances which uses ZFS as the core technology and DTrace as the means of "understanding" what's happening with the devices. This now allows systems which DO NOT have ZFS capabilities (ala Linux, Windows, etc) to use and gain benefit from the features of ZFS and DTrace (albeit limited to the "Storage" part only).

PostgreSQL (on Linux,Solaris, Windows) can use such systems through one of the standard protocols that these OpenStorage devices support

CIFS (Only on Windows)

iSCSI

PostgreSQL on NFS is already discussed at length (both merits and also the demerits of using NFS with PostgreSQL ) in the PostgreSQL community. Also I haven't met anybody yet who said that they are using PostgreSQL with their database on CIFS mounted devices. So I think one might actually select iSCSI as the favorable protocol for such OpenStorage devices with PostgreSQL. (Correct me via comments if you think otherwise.)

Here are quick steps that I had used to configure the PostgreSQL server (running Solaris 10) to use iSCSI devices exported from Sun Storage 7000 Unified Storage Platform

In my case I had created two iSCSI LUN devices (one for PGDATA and one for pg_xlog) with file permissions for the postgres user. (This setup is generally done using the BUI - Browser User Interface of the OpenStorage "Amber Road" device.)

Now on the PostgreSQL Server I did the following to setup iSCSI initiators and LUNS:

Add and display iSCSI target discovery address by giving the
IP of the "Amber Road" system.

# iscsiadm add
discovery-address 10.9.168.93

# iscsiadm list
discovery-address

Discovery Address:
10.9.168.93:3260

Display iSCSI targets discovered from the Amber Road system

# iscsiadm list
discovery-address -v 10.9.168.93

Discovery Address: 10.6.140.151:3260

Target name:
iqn.1986-03.com.sun:02:a4602145-85f8-64fa-c0ef-a059394d9a12

Target address:
10.9.168.93:3260, 1

Target name:
iqn.1986-03.com.sun:02:0449398a-486f-4296-9716-bcba3c1be41c

Target address:
10.9.168.93:3260, 1

Enable and display static discovery.

# iscsiadm modify discovery
--static enable

# iscsiadm list discovery

Discovery:

Static: enabled

Send Targets:
disabled

iSNS: disabled

Adds a target to the list of statically configured targets. A
connection to the target will not be attempted unless the static
configuration method of discovery has been enabled.

# iscsiadm add static-config
iqn.1986-03.com.sun:02:9e0b0e03-8823-eb7e-d449-f9c21930ba15,10.9.168.93

# iscsiadm add static-config
iqn.1986-03.com.sun:02:2cc4fe10-c7ba-697f-d95f-fa75efe50239,10.9.168.93

Use Solaris devfsadm(1M) to create iSCSI device nodes.

# devfsadm -i iscsi

Use format(1M) command to access iSCSI disks. The disk(s) to
be selected contain /scsi_vhci in their path name. Local disks are
listed before iSCSI disks in the format command list. The following
shows disk no. 4 and 5 are iSCSI disks.

# format

Searching for
disks...done

AVAILABLE DISK
SELECTIONS:

0. c0t0d0
<DEFAULT cyl 17830 alt 2 hd 255 sec 63>

/pci@0,0/pci8086,25f8@4/pci108e,286@0/sd@0,0

1. c0t1d0
<DEFAULT cyl 17830 alt 2 hd 255 sec 63>

/pci@0,0/pci8086,25f8@4/pci108e,286@0/sd@1,0

2. c0t2d0
<DEFAULT cyl 17830 alt 2 hd 255 sec 63>

/pci@0,0/pci8086,25f8@4/pci108e,286@0/sd@2,0

3. c0t3d0
<DEFAULT cyl 17830 alt 2 hd 255 sec 63>

/pci@0,0/pci8086,25f8@4/pci108e,286@0/sd@3,0

4.
c2t600144F04890703F0000144FA6CCAC00d0 <DEFAULT cyl 13051 alt 2 hd
255 sec 63>

/scsi_vhci/disk@g600144f04890703f0000144fa6ccac00

5.
c2t600144F0489070250000144FA6CCAC00d0 <DEFAULT cyl 13051 alt 2 hd
255 sec 63>

/scsi_vhci/disk@g600144f0489070250000144fa6ccac00

For UFS Follow the following procedure
1. Create file systems on iSCSI disks.
# newfs
/dev/rdsk/c2t600144F04890703F0000144FA6CCAC00d0s0

# newfs
/dev/rdsk/c2t600144F0489070250000144FA6CCAC00d0s0

Make the mount points.

# mkdir /pgdata

# mkdir /pglog

Mount the iSCSI LUNs

# mount -F ufs
-o forcedirectio /dev/dsk/c2t600144F04890703F0000144FA6CCAC00d0s
/pgdata

# mount -F ufs
-o forcedirectio /dev/dsk/c2t600144F0489070250000144FA6CCAC00d0s0
/pglog

For ZFS Create zpool as follows:

# zpool create pgdata
c2t600144F04890703F0000144FA6CCAC00d0s

#
zpool create pglog c2t600144F0489070250000144FA6CCAC00d0s0

For ZFS with Read-Cache and Separate Intent Log (OpenSolaris
2008.05 and later) using local disks

# zpool create pgdata
c2t600144F04890703F0000144FA6CCAC00d0s cache c0t1d0s1
log c0t1d0s0

#
zpool create pglog c2t600144F0489070250000144FA6CCAC00d0s0
cache c0t2d0s1 log c0t2d0s0

PostgreSQL Setup:

Many
times, UFS filesystems have “lost+found” directory which causes
PostgreSQL's initdb command to fail as directory is not empty. Also
major versions of database are incompatible and hence it makes sense
to take a naming convention as follows and create a sub directory
within /pgdata and /pglog depending on the version you are planning
to use:

mkdir
/pgdata/8.3 /pglog/8.3

mkdir
/pgdata/8.2 /pglog/8.2

mkdir
/pgdata/8.1 /pglog/8.1

Then for PostgreSQL 8.3 (Solaris 10 10/08) the step is as follows

/usr/postgres/8.3/bin/initdb
-D /pgdata/8.3 -X /pglog/8.3/pg_xlog

As for 8.2 the steps are as follows:

/usr/postgres/8.2/bin/initdb
-D /pgdata/8.2

mv
/pgdata/8.2/pg_xlog /pglog/8.2/

ln
-s /pglog/8.2/pg_xlog /pgdata/8.2/pg_xlog

And for the default 8.1 the steps are similar:

/usr/bin/initdb
-D /pgdata/8.1

mv
/pgdata/8.1/pg_xlog /pglog/8.1/

ln
-s /pglog/8.1/pg_xlog /pgdata/8.1/pg_xlog

And then use the pg_ctl from the version of PostgreSQL being used to start and stop the server:

pg_ctl
start -D /pgdata/8.3 -l /pgdata/8.3/server.log

pg_ctl
stop -D /pgdata/8.3

In another blog post I will explain on how to take snapshots with PostgreSQL and the OpenStorage devices.

Wednesday, November 05, 2008

Yes You Can - Save $$$ on cost of Proprietary Databases

The New Sun Microsystems announced a new Glassfish/MySQL/OpenSolaris SPECjAppServer2004 result today. The real highlight is that all software used in this benchmark is all Open Source Software. It is a win for Open Source Software including Open Source Databases like PostgreSQL and MySQL. We need more of such benchmarks to highlight the exorbitant prices charged by Proprietary Database & other Software Vendors who charge and force customers to give them all their major dollars of their IT budget.

Tom Daly's blog entry highlight that in terms of Price/Performance, the proprietary database vendors who conveniently also happen to be the Application Server Vendors charge a much higher cost including all hardware/software (10X) associated with same performance which can be obtained by Open Source database alternatives like PostgreSQL and MySQL. Check BM Seer's blog entry for more comparison.

SPEC required
disclosure : -
SPEC, SPECjAppServer reg tm of Standard Performance Evaluation
Corporation.
Results from www.spec.org as of 5th Nov 2008
All comparisons are based on the SPEC SPECjAppServer2004JOPS@Standard
metric from www.spec.org or on pricing made using the bill of materials
included in each SPECjAppServer2004 result

Openbravo ERP with PostgreSQL 8.3 on Solaris 10 10/08

Solaris 10 10/08 (ala Solaris 10 Update 6) is now available. One of the new features of Solaris 10 10/08 is that PostgreSQL 8.3 is now bundled and available at /usr/postgres/8.3. I thought I will take this opportunity to help folks on how to use the new version on PostgreSQL using Openbravo ERP 2.40 as an example. (Note: This is specifically for Solaris 10 Update 6. If you are looking for instructions on how to install Openbravo and PostgreSQL on OpenSolaris check the other blog entry.)

First of all I recommend people to use the ZFS root feature now available in this update. On my Sunblade 2000 I had to use the text installer via

boot cdrom - text

to install the system with ZFS boot.

Also another piece of recommendation is to put /var in a separate dataset. Why? Well all postgresql databases typically goes by default in /var/postgres and hence to use the snapshot feature of ZFS, this provides a fine granularity.

Once Solaris 10 Update 6 is installed, then to see all options of Postgres available on Solaris 10 Update 6 do the following:

# svcs -a |grep postgres
disabled       Oct_31   svc:/application/database/postgresql:version_82_64bit
disabled       Oct_31   svc:/application/database/postgresql:version_82
disabled       Oct_31   svc:/application/database/postgresql:version_81
disabled       Oct_31   svc:/application/database/postgresql_83:default_64bit
disabled       Oct_31   svc:/application/database/postgresql_83:default_32bit

In my case I want to use a 32-bit instance of PostgreSQL 8.3 so I just do the following.

# svcadm enable svc:/application/database/postgresql_83:default_32bit
# svcs -a |grep postgres
disabled       Oct_31   svc:/application/database/postgresql:version_82_64bit
disabled       Oct_31   svc:/application/database/postgresql:version_82
disabled       Oct_31   svc:/application/database/postgresql:version_81
disabled       Oct_31   svc:/application/database/postgresql_83:default_64bit
online         16:56:35 svc:/application/database/postgresql_83:default_32bit

Note when you do svcadm enable for any of the PostgreSQL instances, it first does an initdb and then does the equivalent of pg_ctl start. So only after the first enable of the service, you will see that the data directory initialized in /var/postgres/8.3/data (for 64-bit it will be data_64).

At this point of time I typically edit postgresql.conf located in /var/postgres/8.3/data/postgresql.conf and modify it for my basic tweaks:

checkpoint_segments=16

shared_buffers = 128MB

wal_buffers=512kB

and then restart PostgreSQL server so the changes are incorporated before I start using the PostgreSQL server instance.

# svcadm restart svc:/application/database/postgresql_83:default_32bit

Great now PostgreSQL 8.3 server is up and running. Don't forget to use psql from /usr/postgres/8.3/bin instead of the default /usr/bin which uses PostgreSQL 8.1 psql.

Lets now use an application on top of this new PostgreSQL server. Openbravo ERP 2.40 which was recently released also requires tomcat .

Now there are two versions of tomcat in Solaris 10 Update 6 (unlike OpenSolaris which has none by default and when you install it via pkg it is the latest version). The two versions are the original Tomcat 4 and Tomcat 5.5.

/usr/apache/tomcat - Tomcat 4

/usr/apache/tomcat55 - Tomcat 5.5

For ease of use and Openbravo we want to use the newer Tomcat version.

so let's set the already bundled tomcat55 quickly as follows:

# cp /var/apache/tomcat55/conf/server.xml-example
/var/apache/tomcat/conf55/server.xml

Openbravo ERP 2.40 requires ant to be 1.6.5 or higher. The ant in /usr/sfw/bin in Solaris 10 Update 6 is still the original version 1.5.4. I downloaded apache-ant-1.7.1-bin.zip from ant.apache.org and unzipped it in /opt

Download the Openbravo ERP 2.40 Solaris (SPARC) installer available on its download site.

First of all if you are not logged in as root and use "su -" to root, then using another terminal using the normal userid, give X server display rights to root. As a short cut I use "xhost localhost" to give permission to all users on localhost.

Then back as root

# chmod a+x OpenbravoERP_2.40-solaris-sparc-installer.bin
# DISPLAY=localhost:0.0; export DISPLAY
# ./OpenbravoERP_2.40-solaris-sparc-installer.bin

Here is my quick cheat sheet of answers for the questions that OpenBravo installation GUI asks (note the ones in bold are changed from default):

/opt/OpenbravoERP

/var/OpenbravoERP/AppsOpenbravo/attachments

Complete

Standard

/usr/jdk/instances/jdk1.5.0

/opt/apache-ant-1.7.1/bin/ant

/var/apache/tomcat55

PostgreSQL

/usr/postgres/8.3/bin

localhost 5432

(Enter password for postgres user as "postgres" twice)

openbravo tad (Enter password for tad user twice)

Context name: openbravo

Date format: DD MM YYYY, Date Separator -, Time format 24h, Time Separator :

Demo data: Y or N depending on your preferences

After the information the installation GUI takes quite a bit of time
to complete specially if you select to load the demo data. (Hope you
made changes to PostgreSQL before to tune this loading.)

Notice that all user related data (PostgreSQL, Tomcat, Openbravo Attachments) are now going in /var. This allows us now to take a snapshot of /var to revert back to this installed image in case we accidentally corrupt user data or application settings.

Before taking the snapshot it might be a good procedure to make sure the database and tomcat is not running. Since we haven't started tomcat55 yet we only have to bring down PostgreSQL server.

# svcadm disable svc:/application/database/postgresql_83:default_32bit

# zfs snapshot rpool/ROOT/s10s_u6wos_07b/var@openbravoinstall

Once the installation completes start tomcat as follows

# svcadm enable svc:/application/database/postgresql_83:default_32bit

# /usr/apache/tomcat55/bin/startup.sh

Once tomcat has completely started (depending on system it could be more than couple of seconds), open a browser and go to

http://localhost:8080/openbravo

If
you see the login screen if everything goes right !. Use Openbravo as
username and openbravo (all lower case) as password to login and set it
up for your business.

If you want to rollback to this intial version just do:

# /usr/apache/tomcat55/bin/shutdown.sh

# svcadm disable svc:/application/database/postgresql_83:default_32bit

# zfs rollback rpool/ROOT/s10s_u6wos_07b/var@obenbravoinstall

# svcadm enable svc:/application/database/postgresql_83:default_32bit

# /usr/apache/tomcat55/bin/startup.sh

Once tomcat has completely started, open a browser and go back to the browser and start again:

http://localhost:8080/openbravo

More help on installation is available on their wiki site.

Thursday, October 30, 2008

PostgreSQL 8.3 in pkg.opensolaris.org repository (Mostly)

PostgreSQL 8.3.4 is now in the pkg.opensolaris.org repository (most of it). Since there is still no single manifest tag for all the files yet, so I thought I will highlight the names of the packages (specially the main ones with name changes) for a easy reference. Also as Hugo found that one of the packages is missing in the repository (SUNWpostgr-83-server-data-root) which needs tracking down.

The main pkgs to be installed (via "pkg install pkgnames") are as follows

SUNWpostgr-83-server

SUNWpostgr-83-libs (pulled by SUNWpostgr-83-server as dependent)

SUNWpostgr-83-client

SUNWpostgr-jdbc

SUNWpostgr-83-docs

SUNWpostgr-83-contrib

SUNWpostgr-83-devel

SUNWpostgr-83-pl

SUNWpostgr-83-tcl

SUNWpgbouncer-pg83

Note pgbouncer is now part of the repository also.

Note the JDBC package now includes the latest support and there is no seperate version to download.

Update : SUNWpostgr-83-server-data-root is no longer required and is part of SUNWpostgr-83-server

Tuesday, October 21, 2008

Make Open Source Project "runs best" on OpenSolaris

Often people ask me what I do for a living and my short answer is "We make applications run best on Solaris". However it is generally not clear to many people what "run best" on Solaris or now OpenSolaris really means. Majority of the people think it is all about performance since it can be a measured quantity for an action "runs" in terms of "how fast". Well that is just one part of the job. Runs best actually means multiple thing. Lets take an example to see how it means multiple thing.

Consider a Business problem that can be solved by using a solution based on IT. Well the first thing most people will do is find the right piece of software that solves the business problem. Lets call it the BusinessApp. Now BusinessApp is what now drives the foodchain for the Techical Architect to see how that BusinessApp can be deployed to acheive the goal. Let's assume a simple scenario that the BusinessApp requires a DatabaseApp and both of them running on some OperatingSystem running on SomeHardware.

Let's recap the components

BusinessApp (for example: Openbravo ERP)

DatabaseApp (for example: PostgreSQL)

OperatingSystem (for example: OpenSolaris)

SomeHardware (for example: Intel Xeon Quad Core based system)

So the question is what really features are more appealing to the Technical Architect:

It should work - No brainer

It should be easy to integrate the components and deploy it

It should be easy to maintain

It should perform meeting a minimum user expectations on how long to wait

It should scale with the increase in the number of users

So "run bests" means all of the above in tandem. The priorities may differ slightly depending on the Technical Architect but most of them will agree the initial thing is to meet the first three steps before you go for a pilot to test the performance out.

So how do we achieve "runs best" on OpenSolaris

First of all we identify that the individual components (say PostgreSQL, Openbravo, etc) works on OpenSolaris which is what most people mean when they say "Oh! it works", "Oh! Its supported" in ascending order of deep support of components in the stack.

The "Easy to integrate/deploy" is often missed out however it is quite important for the "First User Experience of a deployer" who often makes recommendations on what's easy to do. For example to use Openbravo on OpenSolaris you need PostgreSQL, Tomcat, Ant, JDK and integrate them together before you can deploy the Openbravo ERP application on OpenSolaris. Most of the time you need a tight integration to "Ease" the pains which generally to many are more critical than the performance problems itself. This is mainly resolved by having documentation and/or good scripts which makes sure that the dependent and other components are preinstalled and a great installer to integrate all of them together. Remember pkg(5) doesn't do all we want, it installs the components we still need to have good scripts to integrate all the components together to act as a single integrated application. That's an effort which has the biggest return on investment since many people find this step as make or break to continue with the effort of seeing the feasability of whether deployment is possible or not.

Yet another thing which is important but often sidelined is "Easy to Maintain". For example starting Openbravo requires that PostgreSQL server is running and then also Tomcat is running. Instead of making a user run different scripts, a well written SMF Manifest makes it easy to start and stop the service on OpenSolaris which can be made practically hands free and start and stop along with the database. Otherthing which really helps "Easy to Maintain" are the new ZFS snapshot features which in the next release of OpenSolaris (2008.11) will include Time Slider which can be tweaked to make sure that the app and all its related components have consistent snapshots which means you always have a good and working version to revert back in case a bad thing happens.

This one is the most popular of all "It Performs Great!". Of course it doesnt perform all by itself. Every Technical Architect knows that its best to do their own performance test if their name is going to be tied to solution. DTrace on OpenSolaris is exactly an aid in those scenarios to help resolve the problems many times before it occurs in Production. And fortunately even when the problem does make it to the production system also.

Of course no solution is complete if the scaling process is not thought out before. These are the what-if scenarios. What if instead of 50 regular users, I get 30,000 users because an email from HR was fired to input all the vacation days in the system before the end of the day. Such scenarios are unfortunately more common than not.

Again all this is part of the job to make an OpenSource project for that matter any project run best on OpenSolaris

Thursday, October 02, 2008

The New Sun Microsystems

It is the start of the last quarter of 2008 and we engineers in ISV Engineering are still baffled when we hear from new Startups when they say "Oh we are startups, I don't think we can afford Sun Microsystems" or sometimes even from old partners. It's like Flashback to 1999! Yes we were selling $4million servers based only on SPARC in 1999. However in 2008 we are selling many servers for much less than $1000. (HINT: Join Sun Startup Essentials) In fact now we sell systems based on various CPU architectures like AMD Opteron, Intel Xeon, UltraSPARC T2, SPARC64 VII. We also have one of the biggest OpenSource offerings right from MySQL, Java, OpenOffice.org, OpenSolaris, Glassfish, Netbeans, etc. (I for one cannot even remember all our OpenSource projects). So what we figured that essentially it is hard for non-Sun folks to distinguish the current Sun from the Sun of 1999. Essentially our biggest competition in 2008 is the lingering image of Sun from 1999 as seller of expensive Sun SPARC systems.

It is hard to push that message out when you don't have advertising budget. But we have to get the message out. So what am I going to do about it?

Just to show the fact that we have changed from last decade, next time when I have to introduce myself, I am going to say:

"I am Jignesh and I work for The New Sun Microsystems"

to highlight the fact and push the message out. I am changing the introduction in my blog and my email signature to reflect the same. I am hoping people will read or hear the "New" moniker and ask their favorite Sun contacts "What's new?" and that's the exactly the question that I want those potential customers to ask us.

Update on Nov 6, 2008: Check out the New Sun.com website

Wednesday, October 01, 2008

Replacing Boot Disk which is running OpenSolaris

Recently my boot disk of my workstation which is running OpenSolaris 2008.05 started giving me media errors. Just for the record the errors looked like this:

WARNING: /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0 (sd2):
     Error for Command: read(10)                Error Level: Retryable
       Requested Block: 4213122                   Error Block: 4213126
       Vendor: FUJITSU                            Serial Number: AAG0P5300991
       Sense Key: Media_Error
       ASC: 0x11 (read retries exhausted), ASCQ: 0x1, FRU: 0x0
WARNING: /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0 (sd2):
    Error for Command: read(10)                Error Level: Retryable
       Requested Block: 4213122                   Error Block: 4213126
       Vendor: FUJITSU                            Serial Number: AAG0P5300991
       Sense Key: Media_Error
       ASC: 0x11 (read retries exhausted), ASCQ: 0x1, FRU: 0x0

I did not want to wait till the whole disk died so I started thinking about replacing the disk. I talked to our local lab manager to get a replacement disk and within couple of day he came back with a replacement disk. So with a replacement disk in my hand and the boot disk which is my primary drive in the system I went on a strategy to figure out how to change the disk without really re-installing the system.

First of all I am using an EOL product called Sun Java Workstation W2100z. Fortunately it has extra disk slots in it. Halting OpenSolaris 2008.05 I opened the system without a screw driver (it has twist screws) and I inserted my new disk into one of the available disk slot and it already had ribbon cables and power supply for it. Booted up OpenSolaris and I could see the disks. Great now what.

My primary plan was to clone the disk exactly to be similar to my boot disk so here are my steps to make it look like it

# format
Searching for disks...done

AVAILABLE DISK SELECTIONS:
       0. c5t0d0 <DEFAULT cyl 17879 alt 2 hd 255 sec 63>
          /pci@5,0/pci1022,7450@4/pci108e,534d@4/sd@0,0
       1. c6t0d0 <DEFAULT cyl 17846 alt 2 hd 255 sec 63>
          /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0
Specify disk (enter its number): 0
selecting c5t0d0
[disk formatted]
/dev/dsk/c5t0d0s0 is part of active ZFS pool rpool. Please see zpool(1M).
/dev/dsk/c5t0d0s1 is currently used by swap. Please see swap(1M).
format> par
partition> name
Enter table name (remember quotes): "myroot"
partition> pr
Current partition table (myroot):
Total disk cylinders available: 17879 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
0       root    wm     262 - 17878      134.95GB    (17617/0/0) 283017105
1       swap    wu       1 -   261        2.00GB    (261/0/0)     4192965
2     backup    wu       0 - 17878      136.96GB    (17879/0/0) 287226135
3 unassigned    wm       0                0         (0/0/0)             0
4 unassigned    wm       0                0         (0/0/0)             0
5 unassigned    wm       0                0         (0/0/0)             0
6 unassigned    wm       0                0         (0/0/0)             0
7 unassigned    wm       0                0         (0/0/0)             0
8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
9 unassigned    wm       0                0         (0/0/0)             0

partition> quit
format> disk
AVAILABLE DISK SELECTIONS:
       0. c5t0d0 <DEFAULT cyl 17879 alt 2 hd 255 sec 63>
          /pci@5,0/pci1022,7450@4/pci108e,534d@4/sd@0,0
       1. c6t0d0 <DEFAULT cyl 17846 alt 2 hd 255 sec 63>
          /pci@5,0/pci1022,7450@4/pci108e,534d@4,1/sd@0,0
Specify disk (enter its number)[1]: 1
selecting c6t0d0
[disk formatted]
format> ver
WARNING - This disk may be in use by an application that has
    modified the fdisk table. Ensure that this disk is
    not currently in use before proceeding to use fdisk.
format> fdisk
No fdisk table exists. The default partition for the disk is:

a 100% "SOLARIS System" partition

Type "y" to accept the default partition, otherwise type "n" to edit the
partition table.
y
format> par
partition> pr
Current partition table (original):
Total disk cylinders available: 17846 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
0 unassigned    wm       0                0         (0/0/0)             0
1 unassigned    wm       0                0         (0/0/0)             0
2     backup    wu       0 - 17845      136.71GB    (17846/0/0) 286695990
3 unassigned    wm       0                0         (0/0/0)             0
4 unassigned    wm       0                0         (0/0/0)             0
5 unassigned    wm       0                0         (0/0/0)             0
6 unassigned    wm       0                0         (0/0/0)             0
7 unassigned    wm       0                0         (0/0/0)             0
8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
9 unassigned    wm       0                0         (0/0/0)             0

partition>

Unfortunately since somehow my cylinder counts were different I could not use select to choose the layout of "myroot" and hence had to set the layout manually to be similar to my boot disk.

partition> 0
Part      Tag    Flag     Cylinders         Size            Blocks
0 unassigned    wm       0                0         (0/0/0)             0

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 262
Enter partition size[0b, 0c, 262e, 0.00mb, 0.00gb]: 17845e
partition> 1
Part      Tag    Flag     Cylinders         Size            Blocks
1 unassigned    wm       0                0         (0/0/0)             0

Enter partition id tag[unassigned]:
Enter partition permission flags[wm]:
Enter new starting cyl[0]: 1
Enter partition size[0b, 0c, 1e, 0.00mb, 0.00gb]: 261e
partition> pr
Current partition table (unnamed):
Total disk cylinders available: 17846 + 2 (reserved cylinders)

Part      Tag    Flag     Cylinders         Size            Blocks
0 unassigned    wm     262 - 17845      134.70GB    (17584/0/0) 282486960
1 unassigned    wm       1 -   261        2.00GB    (261/0/0)     4192965
2     backup    wu       0 - 17845      136.71GB    (17846/0/0) 286695990
3 unassigned    wm       0                0         (0/0/0)             0
4 unassigned    wm       0                0         (0/0/0)             0
5 unassigned    wm       0                0         (0/0/0)             0
6 unassigned    wm       0                0         (0/0/0)             0
7 unassigned    wm       0                0         (0/0/0)             0
8       boot    wu       0 -     0        7.84MB    (1/0/0)         16065
9 unassigned    wm       0                0         (0/0/0)             0

partition> lab
Ready to label disk, continue? y

partition>

Now my disk layout looks like my boot disk image. Now I need to tell ZFS to move to the new disk. I wasn't comfortable doing all this in GUI so I reverted to the character mode by issuing the following command

# svcadm disable gdm

In the character login mode, I used the following step to move the ZFS rpool to the new disk

# zpool replace rpool c5t0d0s0 c6t0d0s0

(Sometimes you may have to use force flag if you have some some data already on c6t0d0s0

# zpool replace -f rpool c5t0d0s0 c6t0d0s0

The command did look like it returned quite quickly but it then I found it spawns a background task to start moving the pool to the new system. I observed it by using

# zpool iostat -v

to see the IOs happening from the old disk to the new

and also

# zpool status -v

Once it successfully completed replacing the disk which could take a while, then I made sure the swap location is also moved to the new disk in /etc/vfstab

# vi /etc/vfstab

(replace c5t0d0s1 to c6t0d0s1)

Now to make sure Grub is also available on the new disk and it knows where the new ZFS Root drive will be located:

# cd /boot/grub

# installgrub stage1 stage2 /dev/rdsk/c6t0d0s0

Final step before rebooting the operating system, I made sure the mini-root is updated with the new information

# bootadm update-archive

And now I rebooted the system. Here is where there is a bit of timing issue since I need to press F2 to get into the boot menu and select the other disk to boot from in the CMOS menu. Booting into the other hard disk and seeing a working grub menu gave me a positive feeling that things are going right. Selecting the OpenSolaris installation started booting Solaris again. Since I had disabled the GUI login, it took me to the login prompt. Login using my primary id and then "su" into root shell

I checked using zpool status -v and "swap -l" to verify that rpool and swap are pointing to the new disk. Once I was satisfied, I enabled GUI login as follows:

# svcadm enable gdm; exit

( Note as soon as you enable GUI via gdm it starts X windows and leave your shell hanging logged in. Hence I had the extra ";exit" to immediately exit after starting the GUI. )

Then I took the system down again to remove my dying disk from it and sent it back to my lab manager.