Thursday, February 07, 2008

Best Practices with PostgreSQL 8.3 on Solaris

With the release of PostgreSQL 8.3, its time for people to start deploying it in their environments. So this post is primarily for people planning to use PostgreSQL 8.3 on Solaris.  Now PostgreSQL is bundled in Solaris. So how will PostgreSQL 8.3 fit into Solaris?  PostgreSQL 8.3 should be soon integrated in Solaris Nevada train so it should appear within next few builds of Solaris Express. However since the Solaris 10 Update release takes more time to appear and unfortunately may not make it in the next immediate release. Hence this post caters to what can be done now in order to use PostgreSQL 8.3 on Solaris.

First of all, for people already using Solaris Nevada or Solaris Express, Developer Edition 1/08  can now download the 32-bit community binaries from download site. Yes, I believe, this is the first time when Solaris binaries are available for PostgreSQL immediately at the time of general release.  These binaries were built on Solaris Express and hence will not work on Solaris 10 8/07 release (or earlier). There are few reasons why this is the case (but I wont go in details of that).

However for Solaris 10 customers wanting to use it now also have an option. Download the source code of 8.3 from  Refer to a document written by Bjorn Munch on how to build your own PostgreSQL binaries (quite similar to the way the release team at Sun builds them but tested only on Solaris Express for now).  Bjorn does plan to update it for 64-bit binaries soon (and hopefully for Solaris 10 8/07).

I think 64-bit binaries of 8.3 are now essential since with the amount of memory available going to the 256GB range even with Opeteron processors (like the Sun Fire X4600), the 32-bit binaries are not enough to capitalize the RAM of the system. (Though one should not go overboard and set shared_memory in postgresql.conf to 200GB in such cases even with 64-bit builds.)


 Anyway here is a quick way of generating 64-bit binaries without bells and whistles around it. I am assuming you already Sun Studio 12 Compilers installed in /opt/SUNWspro/bin. (Make sure  to create a directory called /opt/postgres/8.3 and the users generating the binaries have write permissions on that directory)

$ PATH=$PATH:/opt/SUNWspro/bin:/usr/sfw/bin:/usr/ccs/bin:/usr/sbin; export PATH

$ cd postgreql-8.3.0

$ ./configure CC=cc --without-readline --prefix=/opt/postgres/8.3 CFLAGS="-xO3 -m64"   --enable-thread-safety --enable-dtrace DTRACEFLAGS=-64


$ gmake  

$ gmake install


So that concludes the section of how to get the binaries for PostgreSQL 8.3 for Solaris.

Now lets look at deployment practices for PostgreSQL 8.3 for Solaris. I am assuming that this is for a decent sized (atleast in single digit GBs if not in 100s of GBs) database.

First of all the database user that will be running PostgreSQL 8.3 server should have rights to use resources required to run postgres.

Lets assume that the user is pguser then you need to set resources defined for pguser similar to what is shown below:

# projadd -U pguser user.pguser
# projmod -a -K "project.max-shm-ids=(priv,32768,deny)" user.pguser
# projmod -a -K "project.max-sem-ids=(priv,4000,deny)" user.pguser
# projmod -a -K "project.max-shm-memory=(priv,30813703372,deny)" user.pguser
# projmod -a -K "project.max-msg-ids=(priv,4000,deny)" user.pguser

 Now we need three filesystems (again, I am stressing that this is for big databases and hence not necessarily desirable if your database is only few MBs big). Why do we need three file systems? Based on various benchmarks like EAStress, iGen, pgBench and feedback from other users it seems we can classify mosts IO operations in three kinds of buckets

  1. PostgreSQL Data -  Cluster related with  execution information and temporary files

  2. Log files which are mostly only written and synced after every write

  3. Database and index files

People familiar with Solaris would know that many best practices will suggest to use forcedirectio with UFS. That suggestion is true for (2) and (3) above but for most operations on the rest of the types in $PGDATA which includes pg_clog, etc results in degraded performance. By putting them in three separate buckets its easy then to apply tunings specific to individual buckets and monitor them separately.

Right now the current best practice is to put:

  1. on default UFS filesystem (which is buffered)

  2. on UFS filesystem with forcedirectio enabled

  3. on UFS filesystem with forcedirectio enabled

(3) can be further broken down in still more separate tablespaces if needed.
The reason to separate (2) and (3) is since (2) is in constant state of syncing by separting it out on separate spindles and separate filesystem (3) is relieved from the stress.

Now (2) does not require much diskspace compared to (1) and (3). It primarily depends on checkpoint_segments. So a reasonable size could be about 10GB or so.

(1) is bit trickly. Mostly it is small. However temporarily files created (for example during index creation it will store the intermediate files here. Hence it needs to be relatively big though most of the temporary data will be cleared when the particular operation generating the files finishes.

(3) is where most of the database and indices (or indexes if you prefer) resides. (Again it can be broken into multiple filesystems if needed)

 Now for example consider you have created the following three filesystems

  1. /pg83db/data   with mount options in /etc/vfstab  -

  2. /pg83db/log     with mount options in /etc/vfstab forcedirectio

  3. /pg83db/mydb with mount options in /etc/vfstab forcedirectio

Now lets see how to setup PostgreSQL 8.3 to use these three file systems

First of  all make sure that the user (pguser in this case) has write permissions to all the three file systems. It i recommended to change ownership of the above directories to the user

# chown pguser /pg83db/data /pg83db/log /pg83db/mydb

Login as user (pguser in this case) and set the environment variables to pick up the right binaries and point to right data directory. (Ideally add it to .profile file so the next time you login, you have have to set it again.)

$ PATH=/opt/postgres/8.3/bin:$PATH; export PATH 

$ PGDATA=/pg83db/data; export PGDATA 

Confirm path being used is correct

$ which initdb


You can also confirm if it is 32-bit or 64-bit.

$ file `which initdb`


(In case of SPARC you will see)

/opt/postgres/8.3/bin/initdb:      ELF 64-bit MSB executable SPARCV9 Version 1, dynamically linked, not stripped      

 (In case of x64 you will see)

/opt/postgres/8.3/bin/initdb:      ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR CMOV FPU], dynamically linked, not stripped   


Then initialize the database with the environment variables in place (UPDATED: Using -X command)
$ initdb -X /pg83db/log/pg_xlog

$ cd $PGDATA

Edit pg_hba.conf to allow your identified clients to connect from remote server.

$ vi pg_hba.conf

For example add the following line to allow a client to connect from
host    all         all trust

Set popular tunables in postgresql.conf for high performance deployments with 10-100G configurations

$ vi postgresql.conf 

Modify the following parameters from their default value in postgresql.conf to similar values as needed. (Set shared_buffers to about 25%-50% of RAM to begin with). (Lets say major workloads are more OLTP like, where PostgreSQL is acting as a backend for J2EE applications)

listen_addresses = '*'
temp_buffers = 8MB
maintenance_work_mem = 256MB
wal_sync_method = fdatasync        
synchronous_commit = false
full_page_writes = off                  
wal_buffers = 512      
commit_delay = 10   
checkpoint_segments = 128

Updates: synchronous_commit is primarily for PostgreSQL users who generally turn fsync=off. There is a potential of losing 3X wal_writer_delay but the good thing is database is generally consistent even after failure. But it is not meant for mission critical systems or financial related transactions.

Setting commit_delay is a big win for OLTP workloads since this allows multiple transactions to be commited together (and hence also cutting the number of IOPS to the log file system) at the cost of adding probably a 1ms or so to the transactions when the load is pretty light to make an impact.

Turning full_page_writes off enables significant reduction in IOps on the log device and improving response times of transactions with the reduced load on the log device. (Of course there is a risk of partial writes related to it. But recommended when you have a decent storage array)

(More note for other workloads like DSS coming soon..)

Now we are ready to start PostgreSQL 8.3 for the first time on the server

$ pg_ctl start -l /pg83db/data/logfile

Now before creating the database, create a tablespace on /pg83db/mydb to use it for data tables and index files for the database to be created.

$ psql postgres -c “create tablespace datatbs location '/pg83db/mydb'”

$ psql postgres -c “create database mydb with  tablespace datatbs”

Now all tables and indexes created by default will be created in /pg83db/mydb

With the above setup we have avoided most of the general pitfalls and have a well configured starting database for your application. There may be needs for tweaking it more based on the workload on the database but most of them have already been applied.


Post a Comment