Wednesday, October 26, 2011

How does PostgreSQL HA works in vFabric Data Director?

Databases go down due to various reasons. Some reasons are known and some unknown.
Common reasons are hardware failure, software failure, database unresponsive, etc. What is considered as a failure is actually one of the tasks. Various DBA's use a simple select statement as a test to make sure that the database is up and working. But what does one do if that simple select statement fails. I remembers years ago I worked on a module which will start paging  engineers in a sequence (and eventually their managers if the engineers failed to respond back in a certain expected way).  In this email/text age, scripts will start sending out emails and text messages.  What we are is basically in the Event->React-> Respond mode of operation.

However true HA needs to lower downtime which can  only be done by having the mode of operation as Event->Respond->React. To explain that when such an event happens, do an automated response first and then React to wake the engineers up :-)

How do you set this up in vFabric Data Director? This can be achieved by selecting the database properties, selecting the Database Configuration tab and set "High Availability" to "Enable". This is also refered as One-Click HA setting.

Of course this assumes that your virtual Data Cluster is set properly for providing the high availability services. How do you set it up properly? Well you need atleast two ESXi Hosts so if one host fails, the other can cover for it. Also vSphere HA property has been enabled in the Virtual Data Center Cluster. Note these settings are all "required" for vFabric Database setup and a "supported" setup does mandate atleast two ESXi Hosts in order for HA to work.

Now that we have gone over the setup requirements, lets go over the scenarios on how the application or user sees it.  A user is connected to the database using the connection string. Something happens and the database goes down and the connection drops. Chances are if you reconnect again immediately it may fail. However with certain time which is expected to be less than 5 minutes (which we call our Recovery Time Objective or RTO)  by default, if you try again you can connect to the database again.

So what happens in the background? Well if it was Magic, we would not tell you. But it is not really magic though it feels like that. Here is what will typically happen in the background.
For some reason the PostgreSQL fails to respond anymore it could be a "hung" situation or the PostgreSQL server has died. There is a small heartbeat monitor which figures out the status of the database. If it notices that the hung situation or no DB server process, it will try to restart the database. If the database cannot be restarted (because the whole VM appliance cannot respond anymore), it will in novice terms kill the virtual machine. The vCenter Server which has its own heartbeat on the VM appliance will see that the Virtual Machine has died (irrespective of the Database Monitor which may not be working if the whole host dies), the vCenter Server will restart the VM appliance on another server.

Since shared-storage is a requirement, the VM appliance will start on another host and it will feel like a reboot. Once the VM starts, the PostgreSQL server process will be restarted. At this point of time, the PostgreSQL server goes into recovery mode. The biggest question at this point of time typically is how long will the recovery mode take. Typically based on internal tests even with the heaviest workload on 8vCPU, the recovery time can finish within the checkpoint_timeout settings which means our Recovery Time Objective is guided by checkpoint_timeout + heartbeat latency + the time to restart the VM on another hosts.  Overall we try to fit that into our Recovery Time Objective of 5 minutes.

Great the virtual machine has restarted and the database has done its recovery and working again. Now what? Well dont forget in this cloud setup, the easiest thing is to use DHCP addresses. Unfortunately DHCP addresses are not guaranteed to be same after reboot . Plus rebooting on a different host makes it more complex to get the same IP. This IP address change can cause the Database connectivity to be lost to the actual end user.   In order to shield the end users from this complexity, we sort of implemented our own Database Name Server. However this can only work by modifying the clients which references the database using this "Virtual Hosts" format so that the clients can always find their intended database without really worrying about where it is running. A minor change in the PostgreSQL clients but a huge complexity reducer for end users to fix their IP addresses or domain names to the changed location.

Aha now this explains why vPostgres ships their own clients and libpq library which is API compatible with standard PostgreSQL libpq library.The libpq library is actually 100% compatible with standard PostgreSQL Libpq library. The only addition it has is the feature of Virtual Hosts which is critical for HA to work seemlessly without the users being concerned about the actual IP of the database. Without the change, HA will not work on the framework. Since it is 100% compatible, if an application works standard libpq it will work with vPostgres libpq. Similar changes are also done in the JDBC driver and ODBC Driver for vPostgres so HA is supported across all supported clients.

That said if you use standard libpq/psql and other standard clients and you know the IP Address of the vPostgres database and connect to it via that IP address (and not the virtual host string)  it will still work flawlessly. However if the database goes down and restarts with a new IP address then the client will have no ability to figure out the new IP address and will have to bug the Administrator to figure out the new IP address.

Though for folks familiar with vSphere terminology, HA is not FT - Fault Tolerant which is a different take on HA to further reduce downtime from minutes to seconds. More on that in future.

Post a Comment