Sunday, November 21, 2010

PostgreSQL 9.0 Simple Select Scaling using sysbench

While I still haven't found a working way to test sysbench read-write test, I decided to continue on sysbench testing with PostgreSQL 9.0. This time I selected the oltp-test-mode=simple which essentially does a select lookup based on a primary key. The basic idea is to see if you are only doing a select statement how far can you scale.

Here are my initial results with PostgreSQL 9.0



This is still work in progress in some ways since this test has never been done before with PostgreSQL 9.0 (or atleast I haven't seen any body publish anything around this before).

Anyway my real goal is not to just put the numbers here but to understand what is happening here. Round about 40-48 clients we seen to peak out on scaling and hit a wall as far as scaling goes. The statement is pretty simple select from sbtest with a primary key which is randomly generated.

This is where I miss dtrace and OpenSolaris. Now that I am working on Linux it thought of trying it out with systemtap. The setting up of a working systemtap itself was a big challenge and took me some time to make it to work on my existing kernel. (Boy Linux should get their act straight at user level tracing. Its not omnipresent as dtrace on Solaris).

I converted my old lockstat utility to work on systemtap. The script seems to work but systemtap can act funny sometimes like dont abruptly exit from systemtap  otherwise it may send a "wrong" signal to postgres backend which then just commits suicide since it cannot figure out what to do with such "trace/debug" signal.

        LOCKNAME LWID M W/A COUNT SUM-TIME(us) MAX(us) AVG-TIME(us)
      LockMgrLock 45 Ex W   85343 469682510 13152 5503
      LockMgrLock 57 Ex W   57547  30903727  8313  537
      LockMgrLock 44 Ex W     390     34061  1670   87
      LockMgrLock 59 Ex W     375     41570  2032  110
      LockMgrLock 56 Ex W     361     39685  1889  109
      LockMgrLock 47 Ex W     344     24548  1564   71
      LockMgrLock 54 Ex W     335     67770  2319  202
      LockMgrLock 50 Ex W     325     44213  1690  136
      LockMgrLock 49 Ex W     325     39280  1475  120
      LockMgrLock 55 Ex W     323     39448  1584  122
      LockMgrLock 48 Ex W     323     26982  1669   83


What you see above is top WAITS on lwlocks by count and what was the average wait and max wait time for a particular LWLock.

(Thinking it might be related to NUM_LOCK_PARTITIONS, I did some experiments with different sizes but since they are all related to the same table it does not help here.)

Then I modified my lockstat script slightly to do stack straces for those two locks:

Majority of those were pretty much caused by the same code path:

Lock id:45, LockMode:0
 0x000000000062412e : LWLockAcquire+0x25e/0x270 [/usr/local/aurora-1.0/bin/postgres]
 0x00000000006228cc : LockAcquireExtended+0x2dc/0xa40 [/usr/local/aurora-1.0/bin/postgres]
 0x0000000000620788 : LockRelationOid+0x48/0x60 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000064be85 : AcquireExecutorLocks+0xd5/0x190 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000064c96a : RevalidateCachedPlan+0x5a/0x3b0 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000062ee64 : exec_bind_message+0x604/0xab0 [/usr/local/aurora-1.0/bin/postgres]
 0x0000000000630bbd : PostgresMain+0x82d/0x16e0 [/usr/local/aurora-1.0/bin/postgres]
 0x00000000005f302e : ServerLoop+0x96e/0xcb0 [/usr/local/aurora-1.0/bin/postgres]
 0x00000000005f3dec : PostmasterMain+0xa7c/0x1150 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000044e4f0 : main+0x370/0x430 [/usr/local/aurora-1.0/bin/postgres]

Lock id:57, LockMode:0
 0x000000000062412e : LWLockAcquire+0x25e/0x270 [/usr/local/aurora-1.0/bin/postgres]
 0x00000000006228cc : LockAcquireExtended+0x2dc/0xa40 [/usr/local/aurora-1.0/bin/postgres]
 0x0000000000620788 : LockRelationOid+0x48/0x60 [/usr/local/aurora-1.0/bin/postgres]
 0x0000000000477425 : relation_open+0x55/0x90 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000047e5e3 : index_open+0x13/0x90 [/usr/local/aurora-1.0/bin/postgres]
 0x00000000005732a5 : ExecInitIndexScan+0x125/0x1e0 [/usr/local/aurora-1.0/bin/postgres]
 0x0000000000560275 : ExecInitNode+0x135/0x290 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000055ed80 : standard_ExecutorStart+0x530/0xc70 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000063278b : PortalStart+0x1bb/0x380 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000062ee9c : exec_bind_message+0x63c/0xab0 [/usr/local/aurora-1.0/bin/postgres]
 0x0000000000630bbd : PostgresMain+0x82d/0x16e0 [/usr/local/aurora-1.0/bin/postgres]
 0x00000000005f302e : ServerLoop+0x96e/0xcb0 [/usr/local/aurora-1.0/bin/postgres]
 0x00000000005f3dec : PostmasterMain+0xa7c/0x1150 [/usr/local/aurora-1.0/bin/postgres]
 0x000000000044e4f0 : main+0x370/0x430 [/usr/local/aurora-1.0/bin/postgres]


So what I understand is that we have two problems here:
1. RevalidateCachedPlan (the Major bottleneck)
2. AccessShare Lock entry (the second bottleneck)

Well atleast now I know the area where there seems to be some scaling bottlenecks which can limit simple SELECT statement scalings. The question now is what to do about them. Back to the mailing list.

1 comment:

Frank Ch. Eigler said...

Hi, Jignesh. Thanks for working through the systemtap setup pains and getting some useful data out. Please be aware that the trace/debug signal issue is one that we've seen occasionally, and recent mainline (git systemtap) code has some related changes. If you see the same problems, you might give that a try.