Monday, October 29, 2007

PostgreSQL wal_sync_method and O_DIRECT on Solaris

Starting with PostgreSQL 8.2, I observed in the documentation that if wal_sync_method is set to open_datasync (O_DSYNC) or open_sync (O_SYNC) then PostgreSQL will set O_DIRECT flag (to bypass file system buffer) where O_DIRECT flag is supported. PostgreSQL does that by checking if O_DIRECT is defined and enables it accordingly.


Thats great, since both O_DSYNC and O_SYNC are  synchronous flags, there is no use for file system to cache it (maybe not true for reads). But certainly by default, if it is not directly writing to the disks, it generally is not getting the best response time on the completion of the write calls. Even on Solaris, it is generally recommended that if you use O_DSYNC flags on your files that are opened, then it is better to write directly to the underlying disks.


However there is a small problem. O_DIRECT flag is not supported on Solaris. Hence the assumption that PostgreSQL while using wal_sync_method as open_datasync or open_sync is writing directly to the disks on Solaris is not true. There is another api  directio(3C) which needs to be used on Solaris. Solaris  has no other way of knowing that the application is requesting to bypass the file system buffer cache. (Only other alternative is for Solaris administrator to use forcedirectio as mount options but that does it at the filesystem level impacting all files on that filesystem.)


directio(3C) API was introduced in Solaris 8 and hence applications using it should compile for Solaris 8,9,10 and all OpenSolaris based distributions.


I did a quick test by just modifying BasicFileOpen function in fd.c in PostgreSQL 8.3beta1 to advise the directio(fd,DIRECTIO_ON) while still using wal_sync_method=open_datasync and saw performance improvements with the recompiled application. Ofcourse in my small quick test it turned on DIRECTIO for all files and that is not something that we want (remember the CLOG Buffer thrashing from couple of days ago?). Looks like we need a hacker to modify the code to advise DIRECTIO_ON for XLOG, datafile and the index files when open_datasync or open_sync is used as wal_sync_method and fsync is enabled.


 


No comments: