Wednesday, August 10, 2005

Postgres: What needs to be improved?



Open Source Database are slowly and steadily gaining prominence. However I still think it can really rise pretty fast specially in enterprises if it can overcome some of the current limitations of its release. In today's post I will focus more on PostgreSQL. Coming from DSS- Decision Support Systems side of the world, I am focusing (for this post) on how to make PostgreSQL make more friendly for Data Warehousing type of workloads.



I will be happy for feedback on this list to either overcome it or brainstorm with a PostgreSQL.org contributor to fix it. Look at this list as my TO-DO to investigate list on how to solve these problems with PostgreSQL.



PostgreSQL Query Processor is not SMP aware

Each connection to the database does everything with 1 process. Since PostgreSQL is not threaded this means it can use only one CPU per connection. This is not a limitation for OLTP type of applications where you have multiple connections, but for DSS type of queries where queries are expected to run longer this is a killer. The current workaround is to use 1-CPU zones/server and MPP extensions like GreenPlum's DeepGreen MPP to overcome this limitation which ofcourse comes at a cost. Maybe making each connection a multi-threaded process will really help PostgreSQL where multiple worker threads can be associated with "reading", "writing", "lock management", "data filtering", etc.



PostgreSQL IO Management

Single connections IO can perform real well on PostgreSQL, however as multiple connections try to read/write to the same table it suddenly starts to drop performance. PostgreSQL 8 has background writer and this can be simulated with it. If you increase the background writer to wake up more frequently to relieve your connection which is busy "COPY"ing or loading data, then using DTrace on Solaris 10 you will see that suddenly they start fighting over the semaphores rather than "write" the data. But if you reduce the frequency of the background writer, your connection will spend more time on writes and not do useful work at the same time like parsing the data from the input file with the idle CPUs on the server. Again, multi-threading might help once the write functions is delegated to a thread and hence the parsing thread can continue with its reads from the source files.



PostgreSQL File Fragmentation

PostgreSQL unlike other commercial databases does not allow database files to pregrow to certain sizes. So if you are loading multiple tables via different connections there are two things that hurts scalability: One is the semaphore locking which it needs to perform IO to the database files and second is file fragmentation since it creates all tables in the same file system and grows them as needed. So if both the tables are loaded then both files are growing at "same" time which typically is seralized as blocks are allocated to each of the file one at a time which means they will be dispersed and not contiguous. How this hurts? Well if you do total row scans and compare the time you can easily huge degradations. (I have seen about 50% degradations). This means you have to load 1 table at a time. However if there was a way to increase the space for the tables (pre-grown them) then it will be a bit easier to load multiple tables simultaneously. (Of course the semaphore problem is still there and that needs to be more granular also). Duh.. I forgot the workaround here.. TABLESPACES are finally available in PostgreSQL 8. But semaphore problems are still existing and pre-growing files will still help a lot since "growing" the files will be in your "1" process connection timeline.



PostgreSQL Index Scans

The symtoms here are bit vague and very much query plan specific. However even in cases where it looks it has to read the complete index it spends a lot of time doing read "lseek" compared to the actual read "transfer". (Background: read call does a lseek to position the hard disk and then tranfer the data back to the calling function) If the Index reads are sequential for multi-row return queries then it should not be spending too much time in lseeks and hence the index scans needs to be optimized



Using Multiple Index for same query

Currently each query can use only one index (per Table) in a particular join/subquery. However there are times depending on the predicates multiple index can quickly help to narrow down the rows quickly instead of just using one query and fetching all rows that matches and applying filter criteria. This is changing as a recent Bizgres announced "Bit Map" Index scans if the query planner thinks it is going to return less than 5% of the rows.





These are the core changes. Apart from these changes I would like to see the following features which can really help in migrations of existing installations.



Provide COPY statement with a NOLOG option

This option will bypass WAL during bulk loading for a brand new table and basically means if there is an error it will have to dump the table but if it was empty to begin with and if input feeds increase from say current 5-10MB/sec to say 20-30MB/sec, I would be willing to take that risk. Also if the tablespaces are pregrown, I might even achieve 35-40MB/sec.



Actual Query runtime analysis

This is a potential for a DTrace Provider for the queries in PostgreSQL. It will really help if it can provide actual breakdown of time it spent on each of its operations, the columns involved and what it was trying to get from it to understand the planner better and the actual results. This will help a lot in refining queries




I am reaching close to "brain fried, core dumped" of my thoughts right now. If I think of other things I will just add it to this blog. This is based on my understanding of PostgreSQL, feel happy to correct me and increase my knowledge about Postgres.



Next time maybe it will be time for MySQL talks.


2 comments:

Alberto Juez said...

Hi!
my name is Alberto and I'm from Spain.
I've found your comments very interesting
Recently I've been reading Postgresql documentation and in version 8.1 (actually in beta) you can find some improvements to SMP systems:




(from http://developer.postgresql.org/
docs/postgres/release.html#RELEASE-8-1)




Improve concurrent access to the shared buffer cache (Tom)


Access to the shared buffer cache was identified as a significant scalability problem, particularly on multi-CPU systems. In this release, the way that locking is done in the buffer manager has been overhauled to reduce lock contention and improve scalability. The buffer manager has also been changed to use a "clock sweep" replacement policy.

Jim Nasby said...

Actually, version 8.1 will include bitmap index scans. This is the same code that's in Bizgres; Bizgres just picked it up earlier.
8.1 also allows creating a table without logging via SELECT INTO. Hopefully this will be extended to COPY in 8.2.
The file pre-allocation is interesting because this is an area where the PostgreSQL development community has felt it's OK to rely on the OS to handle that work. Unlike older databases, PostgreSQL relies on the advances that have been made in operating systems in the past 10-20 years. This is why they don't try and use all the memory on the server, for example.
The question of pre-allocating storage space has come up recently on the mailing lists, and one of the arguments against it was that modern filesystems allow enough room between files for fragmentation not to be an issue. Perhaps this is an incorrect assumption. I currently have access to a v40z attached to a full 3510, so maybe I can put this theory to the test, at least on UFS.