Showing posts with label postgresql. Show all posts

Friday, November 22, 2013

Feedback from my sesion at PgConfEU

I had recently done a session at PgConf.EU.

It was interesting and encouraging to see the feedback from it.
Thanks to the people who took time to give the feedback.

Session feedback: My experience with embedding PostgreSQL

Speaker: Jignesh Shah

A total of 4 feedback entries have been submitted for this session.

Rankings

Topic importance
Rating	Count
1	0
2	0
3	1
4	0
5	3

(5 is highest, 1 is lowest)

Content quality
Rating	Count
1	0
2	0
3	0
4	0
5	4

(5 is highest, 1 is lowest)

Speaker knowledge
Rating	Count
1	0
2	0
3	0
4	0
5	4

(5 is highest, 1 is lowest)

Speaker quality
Rating	Count
1	0
2	0
3	0
4	1
5	3

(5 is highest, 1 is lowest)

Comments

Very good presentation - IMHO please add Oracle License Costs to your slides. As on the first view - Business people wont understand that they are variable and WAY HIGHER as expected.
One of the best talks this year.

Thursday, October 31, 2013

My experience with embedding PostgreSQL - #pgconfeu

At my current company, we embed PostgreSQL based technologies in various applications shipped as shrink-wrapped software. In this session we talk about the experience of embedding PostgreSQL where it is not directly exposed to end-user and the issues encountered on how they were resolved.
We will talk about business reasons,technical architecture of deployments, upgrades, security processes on how to work with embedded PostgreSQL databases.

Tuesday, September 17, 2013

PostgreSQL replication? I need temp tables for my reports

One of the frequent things I hear that many times PostgreSQL users avoid running PostgreSQL replication because they want to offload reports that use temporary tables on the slaves. Since PostgreSQL replicas are pure read only, it cannot support temporary tables.

There is one way to overcome this with postgres_fdw - PostgreSQL Foreign Data Wrapper which are improved in PostgreSQL 9.3 which is now released.

Create a Master-Slave setup with PostgreSQL 9.3 (with synchronous or asynchronous replication as per your needs).
On the slave setup, setup another PostgreSQL 9.3 instance (with different port) with postgres_fdw and map all tables from slaves as foreign tables with same names as their remote counterparts.
Run reports which requires temporary tables using this new instance

Of course there are few caveats for this setup

Query plans: Currently they are still inefficient but as postgres_fdw improves, this will likely go away. Infact more usage of this use-case scenario will force it to be improved
Lot of data moving: Most DW reports do read lot of rows. However by setting it up on the same server most of it are loopback and dont go on the wire outside.
More Resources: This will do require more memory/cpu on the server but it is still cheaper since the management of such a server is still more simpler compared to other complex designs to achieve the same goal

I would like to hear about your experiences on the same too so feel free to send me comments.

Tuesday, May 21, 2013

How to do Postgres Replication and Failover in VMware vFabric Data Director 2.7?

Last week VMware released vFabric Data Director 2.7. Among the many new features for various database, I wish to give a little more insight into my favorite ones which are regarding Postgres.

One of the big feature add from a broad perspective is support of Postgres 9.2 based managed database servers along with replication. Lets look at how it is done in brief.

First vFabric Postgres 9.2 base DBVM needs to be uploaded into Data Director system resource pool and then converted into template. Note this template is different from vFabric Postgres 9.2 Virtual appliance and is available on the same location where Data Director 2.7 download exists. Also this DBVM template is based on the latest PostgreSQL 9.2.4 core.

Next once the template is loaded into the system resource pool it will show up in Base DBVMs section in System->Manage and Monor-> Templates-> BaseDBVMs

Then you would then right click on the base DBVM and select “Convert to Base DB Template”. Here you also have a new feature to add more disks (think Tablespaces in PostgreSQL) to the template. In Data Director 2.7, disks are added at the template level.

This process take some time and then it should show up in Base DB Templates. On the Base DB Template section right click on the newly created template and select “Validate”. This one creates a test database to see if it is successful or not before it can be rolled out to the Organizations and Database groups . Once the validation is a success, right click and select “Assign it to the resource bundle” and select the resource bundle that will have access to the template or create a new resource bundle to use the template.

Note in the Resource bundle creation steps lies yet another new way to separate out IO on separate datastores since these may need different IO characteristics

The above image shows how the various types of datastores for your OS, Backup, Data and Logs (Data can be multiple location if you need multiple tablespaces)

Now all the Orgs/Database groups using the resource bundle will see the new vPostgres 9.2 template.

In a sample demo I created a database called MyDB using few wizard questions.

Now on the DB List we do a right click on the database to create more replicas

Couple of more quick wizard questions and a slave is ready. Similarly you can create more slaves of the same master. Now actually behind the scenes it does all the plumbing right from creating a new clone virtual machine from the master and setting it up using standard PostgreSQL replication on the newly created VM but from a user point of view it it is all point , click and done.

The database dashboard also has a new portlet to show bit more information about the replication.

To make a slave as a new master and it will give an option to move all the other slaves to the new master select as follows:

After completion of the failover, the status shown will be similar to:

Where the original slave replica is now redirected to the new Master. The original master is kept as it is for Port-Mortem or other activities. This also gives a way to use it as new use cases like separating a slave for test purpose/DR purpose or separating “mirrors” etc. Best of all it uses all Standard PostgreSQL replication.

Of course one of the thing that it does not do is re-purpose the original Master as a slave of the new Master. For more details there is a perfect opportunity to find Heikki Linnakangas from VMware at PGCon in Ottawa this week and ask him the question after his session .. WHY??? :-)

Tuesday, May 14, 2013

How can PostgreSQL 9.3 beta1 help you?

PostgreSQL 9.3 beta1 is now available. Giving early access to software is always a good idea to test out evolutionary, revolutionary, radical ideas because unless it is field tested, it has not gone through its trial by fire to be proven gold.

There are many new changes introduced in PostgreSQL 9.3 beta1 and I do have few favorites in them.

For example Disk page checksums to detect filesystem failures. In fact this would allow VMware to use the now standard disk page checksum instead of a custom feature. This highly debated feature is required to identify silent bit corruptions (or deter malicious ones). I have been told in talks with database administrators (not just PostgreSQL DBAs) that typically in a year they would face one such incident atleast where one of the disk would show such a bit rot which goes unnoticed without any instrumentations to catch it.

Another change that goes in the right direction is how PostgreSQL maps the shared memory. This small change now allows no kernel changes to be done to start the database with a bigger shared buffer pool. This now allows one less cookbook step to be done to get the database working. Considering that in this cloud world where there are 100,000 VMs running databases one less step is a huge increase in productivity since this step actually required privileges higher than the database instance owner.

Yet another favorite feature is the custom background workers. This new mechanism is certainly a popular one in our team at VMware where are using it heavily to move some of the changes that we had done into custom background workers deployed as extensions and allowed us to align with core PostgreSQL and extra features enabled as extensions as needed.

Next I want to talk of three features : Writeable Foreign Tables and pgsql Foreign Data Wrapper and Automatically update VIEWs together. These features on its own itself are very useful and generic. However when used together it actually opens new possibilities using multiple federated PostgreSQL databases shards with single logical view of the whole database as one. Quite Powerful if you think about it. I hope to see people trying these fundamental features into new derived features now made possible.

Also new JSON functions help PostgreSQL on its evolution to be the Data Platform not just for relational data but also document data.

While I have barely scratched the surface of all the new features in PostgreSQL 9.3 beta1, I am already excited with this release and the possibilities I see going forward in the world of data.

Wednesday, March 13, 2013

PyPgDay 2013 - HA / Replication for PostgreSQL in Virtualized Environments

My presentation at PyPgDay 2013 on HA and Replication for PostgreSQL on Virtualized Environments

Thursday, November 15, 2012

A Step Forward

Recently I upgraded my "lab" setup.
Now it currently looks as follows:

* 2x Physical Hosts running vSphere 5.1
* Controlled by vCenter 5.1 Server Appliance backed with embedded Postgres Database instance
* Monitored by vCenter Operations which has two embedded Postgres Database instance in the vApp
*To monitor my VMs, I installed vFabric Hyperic 5.0 also running embedded Postgres Database VM.
* DBaaS provider vFabric Data Director 2.5 also installed with its embedded Postgres database instance running too

Now for my VMs:
* vFabric Postgres 9.1.6.0 VM integrated with vSphere HA
* My Linux Developer VM running PostgreSQL 9.2.1

If you get everything you need with Postgres, why even QA for other databases anymore?
Well vFabric Hyperic took the first bold step forward.

Wednesday, October 10, 2012

SFPUG Oct 2012: DVDStore Benchmark and PostgreSQL

I presented the DVDStore Benchmark at SFPUG yesterday night.
The presentation is available at http://portal.sliderocket.com/vmware/SFPUG_DVDStore
The video is located at http://www.justin.tv/sfpug/b/335041570

Wednesday, September 19, 2012

PGOpen 2012: DVDStore Benchmark and PostgreSQL

My slides to go with the Demo I did with DVDStore Benchmark and PostgreSQL session.
http://portal.sliderocket.com/vmware/PGOpen2012_DVDStore

Other session presentations are available at https://wiki.postgresql.org/wiki/Postgres_Open_2012

Friday, May 18, 2012

PgCon 2012: OLTP Performance Benchmarks Overview

The slides from my presentation today.

or access it on the PGCon Website.

Tuesday, December 20, 2011

Using DVDStore with PostgreSQL

We now have support for PostgreSQL in the popular DVDStore Benchmark which stresses database using an emulated DVDStore e-Commerce website. DVDStore Benchmark is maintained by Dave Jaffe (Dell) and Todd Muirhead (VMware). It is an open source database test kit. The beauty of the benchmark kit is it allows the same web application being deployed either as

Java/Tomcat and connect to the database,
Web Server/PHP and connect to the database,
IIS/ASP.NET connect to the database or
Direct connect to the database and invoking the business logic as stored procedures stored on the database itself.

Currently the PostgreSQL implementation details are as follows

Java/Tomcat using PostgreSQL JDBC driver,
Web Server/PHP using PHP-postgres modules which uses libpq
Currently there is noIIS/ASP.NET web app implementation for PostgreSQL
Direct connect to PostgreSQL database and business logic implemented in stored procedures however the driver is implemented using .NET C# and requires Npgsql 2.0.11.0

Setup instructions for the database are relatively quite easy.

Download ds21.tar.gz and also ds21_postgresql.tar.gz from http://linux.dell.com/dvdstore/
Unzip them on the system running PostgreSQL
The default data size is 10MB. If you want a different size execute 'perl Install_DVDStore.pl' in the ds2 directory. (Expects perl to be available on the system. I used the option 100, MB , PGSQL, LINUX respectively for the options.)
Assuming you are logged on as the DB Owner and the database is on the localhost at port 5432, execute the script pgsql_create_all.sh in the ds2/pgsqlds2 directory. It will create a database "ds2", two users "ds2/ds2" and "web/web", create tables, load tables, create indexes, update sequences and finally run analyze. (The script needs to be modified slighly if the database is already hardened and you want to control the creation of database and the users.)

Setup for the actual load driver is probably easiest on another Windows platform as follows as it was designed for .NET platform.

Download and install Windows SDK v6.1 and .NET 3.5 framework on a Windows Client machine.
Once installed start the CMD prompt from Programs-> Windows SDKv6.1-> CMD Prompt.
Verify the above CMD prompt has path setup for gacutil in windows (Try 'gacutil/l')
Download Npgsql 2.0.11 for msnet35 and install the dlls using the gacutil.exe (Note other versions of Npgsql may have issues.)

gacutil/i Npgsql.dll
gacutil/i Mono.security.dll
gacutil/i policy-2.0.Npgsql.dll

With the above setup you can use the ds2webdriver.exe in ds2/drivers or the direct ds2pgsqldriver.exe in ds2/pgsqlds2. More on running the benchmark driver itself in another post.

Wednesday, October 26, 2011

How does PostgreSQL HA works in vFabric Data Director?

Databases go down due to various reasons. Some reasons are known and some unknown.
Common reasons are hardware failure, software failure, database unresponsive, etc. What is considered as a failure is actually one of the tasks. Various DBA's use a simple select statement as a test to make sure that the database is up and working. But what does one do if that simple select statement fails. I remembers years ago I worked on a module which will start paging engineers in a sequence (and eventually their managers if the engineers failed to respond back in a certain expected way). In this email/text age, scripts will start sending out emails and text messages. What we are is basically in the Event->React-> Respond mode of operation.

However true HA needs to lower downtime which can only be done by having the mode of operation as Event->Respond->React. To explain that when such an event happens, do an automated response first and then React to wake the engineers up :-)

How do you set this up in vFabric Data Director? This can be achieved by selecting the database properties, selecting the Database Configuration tab and set "High Availability" to "Enable". This is also refered as One-Click HA setting.

Of course this assumes that your virtual Data Cluster is set properly for providing the high availability services. How do you set it up properly? Well you need atleast two ESXi Hosts so if one host fails, the other can cover for it. Also vSphere HA property has been enabled in the Virtual Data Center Cluster. Note these settings are all "required" for vFabric Database setup and a "supported" setup does mandate atleast two ESXi Hosts in order for HA to work.

Now that we have gone over the setup requirements, lets go over the scenarios on how the application or user sees it. A user is connected to the database using the connection string. Something happens and the database goes down and the connection drops. Chances are if you reconnect again immediately it may fail. However with certain time which is expected to be less than 5 minutes (which we call our Recovery Time Objective or RTO) by default, if you try again you can connect to the database again.

So what happens in the background? Well if it was Magic, we would not tell you. But it is not really magic though it feels like that. Here is what will typically happen in the background.
For some reason the PostgreSQL fails to respond anymore it could be a "hung" situation or the PostgreSQL server has died. There is a small heartbeat monitor which figures out the status of the database. If it notices that the hung situation or no DB server process, it will try to restart the database. If the database cannot be restarted (because the whole VM appliance cannot respond anymore), it will in novice terms kill the virtual machine. The vCenter Server which has its own heartbeat on the VM appliance will see that the Virtual Machine has died (irrespective of the Database Monitor which may not be working if the whole host dies), the vCenter Server will restart the VM appliance on another server.

Since shared-storage is a requirement, the VM appliance will start on another host and it will feel like a reboot. Once the VM starts, the PostgreSQL server process will be restarted. At this point of time, the PostgreSQL server goes into recovery mode. The biggest question at this point of time typically is how long will the recovery mode take. Typically based on internal tests even with the heaviest workload on 8vCPU, the recovery time can finish within the checkpoint_timeout settings which means our Recovery Time Objective is guided by checkpoint_timeout + heartbeat latency + the time to restart the VM on another hosts. Overall we try to fit that into our Recovery Time Objective of 5 minutes.

Great the virtual machine has restarted and the database has done its recovery and working again. Now what? Well dont forget in this cloud setup, the easiest thing is to use DHCP addresses. Unfortunately DHCP addresses are not guaranteed to be same after reboot . Plus rebooting on a different host makes it more complex to get the same IP. This IP address change can cause the Database connectivity to be lost to the actual end user. In order to shield the end users from this complexity, we sort of implemented our own Database Name Server. However this can only work by modifying the clients which references the database using this "Virtual Hosts" format so that the clients can always find their intended database without really worrying about where it is running. A minor change in the PostgreSQL clients but a huge complexity reducer for end users to fix their IP addresses or domain names to the changed location.

Aha now this explains why vPostgres ships their own clients and libpq library which is API compatible with standard PostgreSQL libpq library.The libpq library is actually 100% compatible with standard PostgreSQL Libpq library. The only addition it has is the feature of Virtual Hosts which is critical for HA to work seemlessly without the users being concerned about the actual IP of the database. Without the change, HA will not work on the framework. Since it is 100% compatible, if an application works standard libpq it will work with vPostgres libpq. Similar changes are also done in the JDBC driver and ODBC Driver for vPostgres so HA is supported across all supported clients.

That said if you use standard libpq/psql and other standard clients and you know the IP Address of the vPostgres database and connect to it via that IP address (and not the virtual host string) it will still work flawlessly. However if the database goes down and restarts with a new IP address then the client will have no ability to figure out the new IP address and will have to bug the Administrator to figure out the new IP address.

Though for folks familiar with vSphere terminology, HA is not FT - Fault Tolerant which is a different take on HA to further reduce downtime from minutes to seconds. More on that in future.

Thursday, October 13, 2011

Using PostgreSQL Server on Micro Cloud Foundry

With the recent news that PostgreSQL is now available in the Micro Cloud Foundry, I decided to take it for a test spin. I downloaded the Micro Cloud Foundry VM zip file which is about 1.0GB big. After downloading it I unzipped it on my MacBookPro and use VMware Fusion 4.0.2 to open the VM. As the VM booted up the console shows a message

Micro Cloud Foundry not configured

I selected the option 1 to configure the Micro Cloud. It asked me to configure my VM user password, Networking (DHCP or Static) and then asked me to enter my Cloud Foundry configuration token which was provided to me after I had created a pgtest.cloudfoundry.me domain just before the download.

It took about 5 minutes to setup the cloud

After the setup: I got my micro cloud foundry setup with my local IP (looked like a bridge connection rather than NAT).

Then I installed the VMC tool on my Mac using (Need Ruby)
(NOTE: Skip directly to ssh part if you donot want to install Ruby/vmc)

$ gem install vmc

$ vmc target http://api.pgtest.cloudfoundry.me

Got me connected to my micro cloud.
Then I did a

$ vmc register

to create my user account using a email id and password
Then I logged into the MicroCloud using

$ vmc login

Now when I do the following I see the PostgreSQL Service available with other databases also.

$ vmc services

============== System Services ==============

+------------+---------+---------------------------------------+

| Service    | Version | Description                           |

+------------+---------+---------------------------------------+

| mongodb    | 1.8     | MongoDB NoSQL store                   |

| mysql      | 5.1     | MySQL database service                |

| postgresql | 9.0     | PostgreSQL database service (vFabric) |

| rabbitmq   | 2.4     | RabbitMQ messaging service            |

| redis      | 2.2     | Redis key-value store service         |

+------------+---------+---------------------------------------+

=========== Provisioned Services ============

As you can see there are no provisioned services currently.

Here if you are like a Java/Spring developer you want to creating an application using Xin Li's post on "PostgreSQL for Micro Cloud Foundry- Spring Tutorial".

I am not interested in developing Java applications but I want access to the postgresql server directly.

Now comes the ssh part.

Currently the PostgreSQL server is not exposed externally from the Micro Cloud.
But on the console of Micro Cloud VM, you can configure the password of vcap user. Which means now you have ssh access to the Micro Cloud VM.

$ ssh vcap@mircrocloudip

$ cd /var/vcap/store/postgresql

$ vi postgresql.conf 

and edit listen_address to add your database client ip address out there.
For my demo setup I just opened it to all

listen_addresses='*'

Next assign a Postgres password for the "vcap" user

$ /var/vcap/packages/postgresql/bin/psql -d postgres

psql (9.0.4)

Type "help" for help.

postgres=# ALTER USER vcap WITH PASSWORD 'secret';

ALTER ROLE

postgres=#\q

Now I exit from Micro Cloud VM and using the console I restart the services.
Now the PostgreSQL service can be accessed from postgres client anywhere.

For example from a Macbook Pro

$ psql -h microcloudip -d postgres -U vcap
Password for user vcap:
psql (9.0.5, server 9.0.4)
Type "help" for help.

postgres=#

Try it out!

Thursday, September 29, 2011

#PGWest 2011 - Using vFabric Postgres - A DB User's Perspective

Here are my slides from my #PGWest 2011 Presentation " Using vFabric Postgres - A DB User's Perspective" for vPostgres Databases as deployed by vFabric Data Director.

Thursday, September 15, 2011

Running PostgreSQL on Virtual Environments - #pgopen 2011

Slides from my presentation at Postgres Open 2011 (#pgopen11) in Chicago.

Tuesday, September 13, 2011

Next Stop: Postgres Open 2011 - Chicago

Soon I will be in Chicago for Postgres Open 2011. Here I will be presenting once again
"Running PostgreSQL on Virtualized Environments" on Thurday - 11:30am in the Cotillion Ballroom (according to the current schedule).

The title should really be "Running Community PostgreSQL on Virtualized Environments" since this presentation really applies to the experiences of running community PostgreSQL in Virtual Machines. Most of the things in the presentation should really be a checklist which helps you get the most of PostgreSQL in VMs. Of course depending on the VM and the underlying infrastructure, your mileage may vary.

Overall I am also looking forward to attend other sessions (besides mine of course) . Greg Smith has couple of sessions which sounds interesting. There is key note by Charles Fan on Friday (that should be no brainer for me). Probably the most interesting to me is "Unlocking the Postgres Lock Manager" by Bruce Momjian which definitely is in line of my interest (since I try to control/avoid/reduce lock contentions and LWLocks in benchmarks which actually lead to my past presentation in pgcon on "Understanding Postgres LWLocks").

Definitely looking forward for the trip.

Friday, May 20, 2011

Understanding PostgreSQL LWLocks - PGCon 2011

From PGCon 2011 here are my slides on "Understanding PostgreSQL LWLocks"

Or alternatively at the direct link.

Thursday, March 24, 2011

Running Postgres in Virtualized Environment - #pgeast 2011

Slides from today's presentation at PGEast.

As usual feedback, questions welcome.

Wednesday, March 23, 2011

Introduction to PostgreSQL for System Administrators - #pgeast 2011

Here are the slides from my presentation today at PGEast 2011.

It turned out to be a fast faced presentation for 45 minutes. Online feedback, questions are welcome as usual.

Update: Also my next presentation "PostgreSQL in Virtualized Environments" has moved to tomorrow (Thurday) at 9:00am in the Boardroom for the early bird attendees :-)

Wednesday, March 16, 2011

PgEast 2011 - New York

Its less than week for PgEast 2011 and this time I have two sessions out there.

My first session is on Wednesday at 3:00pm titled "Introduction to PostgreSQL for System Administrators" . This session is not about learning SQL or any real database feature but it is meant for system administrators to get their first exposure to PostgreSQL as an application running on their systems. We look at basic installation and some internals on understanding the various processes running on the system and understanding them.

My session session is on Friday at 10:05am titled "Running Postgres on Virtualized Environments". This session is about running Postgres in VM using VMware's vSphere. Many of the content while specific to vSphere can be used to understand in general when running Postgres on any virtual platform.

With the increase in adoption of "Cloud Computing" in various industries, I expect the usage of Postgres on VMs to increase dramatically in next few years. PgEast 2011 has a roundtable discussion with a panel of experts on Thursday at 2:30pm to precisely answer questions about deploying PostgreSQL in the cloud.