Friday, July 20, 2012

Postgres @ VMware

Recently VMware vCenter Server Appliance 5.0 u1a was released with embedded Postgres based distribution. Check out the release notes.

Quote from release notes:
 "vCenter Server Appliance Database Support: The DB2 express embedded database provided with the vCenter Server Appliance has been replaced with VMware vPostgres database. This decreases the appliance footprint and reduces the time to deploy vCenter Server further."

vCenter Server Appliance joins the growing list of VMware products embedding and/or supporting Postgres.

Tuesday, December 20, 2011

Using DVDStore with PostgreSQL

We now have support for PostgreSQL in the popular DVDStore Benchmark which stresses database using an emulated DVDStore e-Commerce website. DVDStore Benchmark is maintained by Dave Jaffe (Dell) and  Todd Muirhead (VMware).  It is an open source database test kit. The beauty of the benchmark kit is it allows the same web application being deployed either as
  1. Java/Tomcat  and connect to the database,
  2. Web Server/PHP and connect to the database,
  3. IIS/ASP.NET connect to the database or
  4. Direct connect to the database and invoking the business logic as stored procedures stored on the database itself.

Currently the PostgreSQL implementation details are as follows
  1. Java/Tomcat using PostgreSQL JDBC driver,
  2. Web Server/PHP  using  PHP-postgres modules which uses  libpq
  3. Currently there is noIIS/ASP.NET web app  implementation for PostgreSQL
  4.  Direct connect to PostgreSQL database and business logic implemented in stored procedures however the driver is implemented using .NET C# and requires Npgsql 2.0.11.0

Setup instructions for the database are relatively quite easy.
  1. Download ds21.tar.gz  and also ds21_postgresql.tar.gz from  http://linux.dell.com/dvdstore/
  2. Unzip them on the system running PostgreSQL
  3. The default data size is 10MB. If you want a different size execute 'perl Install_DVDStore.pl' in the ds2 directory. (Expects perl to be available on the system. I used the option 100, MB , PGSQL, LINUX respectively for the options.)
  4. Assuming you are logged on as the DB Owner and the database is on the localhost at port 5432, execute the script pgsql_create_all.sh in the ds2/pgsqlds2 directory. It will create a database "ds2", two users "ds2/ds2" and "web/web", create tables, load tables, create indexes, update sequences and finally run analyze. (The script needs to be modified slighly if the database is already hardened and you want to control the creation of database and the users.)

Setup for the actual load driver is probably easiest on  another Windows platform as follows as it was designed for .NET platform.
  1. Download and install Windows SDK v6.1 and .NET 3.5 framework  on a Windows Client machine.
  2. Once installed start the CMD prompt from Programs-> Windows SDKv6.1-> CMD Prompt.
  3. Verify the above CMD prompt has path setup for gacutil in windows (Try 'gacutil/l')
  4. Download Npgsql 2.0.11 for msnet35 and install the dlls using the gacutil.exe (Note other versions of Npgsql may have issues.)
    •  gacutil/i Npgsql.dll
    •  gacutil/i Mono.security.dll
    •  gacutil/i policy-2.0.Npgsql.dll


With the above setup you can use the ds2webdriver.exe in ds2/drivers or the direct ds2pgsqldriver.exe in ds2/pgsqlds2. More on running the benchmark driver itself  in another post.

Wednesday, October 26, 2011

How does PostgreSQL HA works in vFabric Data Director?

Databases go down due to various reasons. Some reasons are known and some unknown.
Common reasons are hardware failure, software failure, database unresponsive, etc. What is considered as a failure is actually one of the tasks. Various DBA's use a simple select statement as a test to make sure that the database is up and working. But what does one do if that simple select statement fails. I remembers years ago I worked on a module which will start paging  engineers in a sequence (and eventually their managers if the engineers failed to respond back in a certain expected way).  In this email/text age, scripts will start sending out emails and text messages.  What we are is basically in the Event->React-> Respond mode of operation.

However true HA needs to lower downtime which can  only be done by having the mode of operation as Event->Respond->React. To explain that when such an event happens, do an automated response first and then React to wake the engineers up :-)

How do you set this up in vFabric Data Director? This can be achieved by selecting the database properties, selecting the Database Configuration tab and set "High Availability" to "Enable". This is also refered as One-Click HA setting.

Of course this assumes that your virtual Data Cluster is set properly for providing the high availability services. How do you set it up properly? Well you need atleast two ESXi Hosts so if one host fails, the other can cover for it. Also vSphere HA property has been enabled in the Virtual Data Center Cluster. Note these settings are all "required" for vFabric Database setup and a "supported" setup does mandate atleast two ESXi Hosts in order for HA to work.

Now that we have gone over the setup requirements, lets go over the scenarios on how the application or user sees it.  A user is connected to the database using the connection string. Something happens and the database goes down and the connection drops. Chances are if you reconnect again immediately it may fail. However with certain time which is expected to be less than 5 minutes (which we call our Recovery Time Objective or RTO)  by default, if you try again you can connect to the database again.

So what happens in the background? Well if it was Magic, we would not tell you. But it is not really magic though it feels like that. Here is what will typically happen in the background.
For some reason the PostgreSQL fails to respond anymore it could be a "hung" situation or the PostgreSQL server has died. There is a small heartbeat monitor which figures out the status of the database. If it notices that the hung situation or no DB server process, it will try to restart the database. If the database cannot be restarted (because the whole VM appliance cannot respond anymore), it will in novice terms kill the virtual machine. The vCenter Server which has its own heartbeat on the VM appliance will see that the Virtual Machine has died (irrespective of the Database Monitor which may not be working if the whole host dies), the vCenter Server will restart the VM appliance on another server.

Since shared-storage is a requirement, the VM appliance will start on another host and it will feel like a reboot. Once the VM starts, the PostgreSQL server process will be restarted. At this point of time, the PostgreSQL server goes into recovery mode. The biggest question at this point of time typically is how long will the recovery mode take. Typically based on internal tests even with the heaviest workload on 8vCPU, the recovery time can finish within the checkpoint_timeout settings which means our Recovery Time Objective is guided by checkpoint_timeout + heartbeat latency + the time to restart the VM on another hosts.  Overall we try to fit that into our Recovery Time Objective of 5 minutes.

Great the virtual machine has restarted and the database has done its recovery and working again. Now what? Well dont forget in this cloud setup, the easiest thing is to use DHCP addresses. Unfortunately DHCP addresses are not guaranteed to be same after reboot . Plus rebooting on a different host makes it more complex to get the same IP. This IP address change can cause the Database connectivity to be lost to the actual end user.   In order to shield the end users from this complexity, we sort of implemented our own Database Name Server. However this can only work by modifying the clients which references the database using this "Virtual Hosts" format so that the clients can always find their intended database without really worrying about where it is running. A minor change in the PostgreSQL clients but a huge complexity reducer for end users to fix their IP addresses or domain names to the changed location.

Aha now this explains why vPostgres ships their own clients and libpq library which is API compatible with standard PostgreSQL libpq library.The libpq library is actually 100% compatible with standard PostgreSQL Libpq library. The only addition it has is the feature of Virtual Hosts which is critical for HA to work seemlessly without the users being concerned about the actual IP of the database. Without the change, HA will not work on the framework. Since it is 100% compatible, if an application works standard libpq it will work with vPostgres libpq. Similar changes are also done in the JDBC driver and ODBC Driver for vPostgres so HA is supported across all supported clients.

That said if you use standard libpq/psql and other standard clients and you know the IP Address of the vPostgres database and connect to it via that IP address (and not the virtual host string)  it will still work flawlessly. However if the database goes down and restarts with a new IP address then the client will have no ability to figure out the new IP address and will have to bug the Administrator to figure out the new IP address.

Though for folks familiar with vSphere terminology, HA is not FT - Fault Tolerant which is a different take on HA to further reduce downtime from minutes to seconds. More on that in future.

Thursday, October 13, 2011

Using PostgreSQL Server on Micro Cloud Foundry

With the recent news that PostgreSQL is now available in the Micro Cloud Foundry, I decided to take it for a test spin. I downloaded the Micro Cloud Foundry VM zip file which is about 1.0GB big. After downloading it I unzipped it on my MacBookPro and use VMware Fusion 4.0.2 to open the VM. As the VM booted up the console shows a message

Micro Cloud Foundry not configured

I selected the option 1 to configure the Micro Cloud. It asked me to configure my VM user password, Networking (DHCP or Static) and then asked me to enter my Cloud Foundry configuration token which was provided to me after I had created a pgtest.cloudfoundry.me domain  just before the download.

It took about 5 minutes to setup the cloud

After the setup: I got my micro cloud foundry setup with my local IP (looked like a bridge connection rather than NAT).

Then I installed the VMC tool on my Mac using  (Need Ruby)
(NOTE: Skip directly to ssh part if you donot want to install Ruby/vmc)

$ gem install vmc

$ vmc target http://api.pgtest.cloudfoundry.me

Got me connected to my micro cloud.
Then I did a
$ vmc register
to create my user account using a email id and password
Then I logged into the MicroCloud using
$ vmc login

Now when I do the following I see the PostgreSQL Service available with other databases also.

$ vmc services

============== System Services ==============

+------------+---------+---------------------------------------+
| Service    | Version | Description                           |
+------------+---------+---------------------------------------+
| mongodb    | 1.8     | MongoDB NoSQL store                   |
| mysql      | 5.1     | MySQL database service                |
| postgresql | 9.0     | PostgreSQL database service (vFabric) |
| rabbitmq   | 2.4     | RabbitMQ messaging service            |
| redis      | 2.2     | Redis key-value store service         |
+------------+---------+---------------------------------------+

=========== Provisioned Services ============
As you can see there are no provisioned services currently.


Here if you are like a Java/Spring developer you want to creating an application using Xin Li's post on "PostgreSQL for Micro Cloud Foundry- Spring Tutorial".

I am not interested in developing Java applications but I want access to the postgresql server directly.

Now comes the ssh part.

Currently the PostgreSQL server is not exposed externally from the Micro Cloud.
But on the console of Micro Cloud VM, you can configure the password of vcap user. Which means now you have ssh access to the Micro Cloud VM.

$ ssh vcap@mircrocloudip

$ cd /var/vcap/store/postgresql
$ vi postgresql.conf 

and edit listen_address to add your database client ip address out there.
For my demo setup I just opened it to all
listen_addresses='*'

Next assign a Postgres password for the "vcap" user
$ /var/vcap/packages/postgresql/bin/psql -d postgres
psql (9.0.4)
Type "help" for help.

postgres=# ALTER USER vcap WITH PASSWORD 'secret';
ALTER ROLE
postgres=#\q

Now I exit from Micro Cloud VM and using the console I restart the services.
Now the PostgreSQL service can be accessed from postgres client anywhere.

For example from a Macbook Pro

$ psql -h microcloudip -d postgres -U vcap
Password for user vcap:
psql (9.0.5, server 9.0.4)
Type "help" for help.

postgres=#


Try it out!