Friday, June 08, 2018

Setting up PostgreSQL 11 Beta 1 in Amazon RDS Database Preview Environment

PostgreSQL 11 Beta 1 has been out for more than couple of weeks. The best way to experience it is  to try out the new version and test drive it yourself.

Rather than building it directly from source, I take the easy way out and deploy it in the cloud. Fortunately, it is already available in Amazon RDS Database Preview Environment.

For this post I am going to use the AWS CLI since it is easy to understand the command line and copy/paste it and also easier to script it for repetitive testing. To use the Database Preview environment, the endpoint has to be modified to use instead of the default for the region.

Because there might be multiple PostgreSQL 11 beta releases possible, it is important to understand which build version is being deployed.  I can always leave it to the default which typically would be the latest preferred version but lot of times I want to make sure on the version I am deploying. The command to get all the versions of PostgreSQL 11 is describe-db-engine-versions.

$ aws rds describe-db-engine-versions --engine postgres --db-parameter-group-family postgres11 --endpoint-url 
    "DBEngineVersions": [
            "Engine": "postgres", 
            "DBParameterGroupFamily": "postgres11", 
            "SupportsLogExportsToCloudwatchLogs": false, 
            "SupportsReadReplica": true, 
            "DBEngineDescription": "PostgreSQL", 
            "EngineVersion": "11.20180419", 
            "DBEngineVersionDescription": "PostgreSQL 11.20180419 (68c23cba)", 
            "ValidUpgradeTarget": [
                    "Engine": "postgres", 
                    "IsMajorVersionUpgrade": false, 
                    "AutoUpgrade": false, 
                    "EngineVersion": "11.20180524"
            "Engine": "postgres", 
            "DBParameterGroupFamily": "postgres11", 
            "SupportsLogExportsToCloudwatchLogs": false, 
            "SupportsReadReplica": true, 
            "DBEngineDescription": "PostgreSQL", 
            "EngineVersion": "11.20180524", 
            "DBEngineVersionDescription": "PostgreSQL 11.20180524 (BETA1)", 
            "ValidUpgradeTarget": []


From the above, I see there are two versions 11.20180419 and 11.20180524. The versions are based on datestamp with the description showing the tag information of the version. Since I am interested in the BETA1 version I use the version 11.20180524.

$ aws rds create-db-instance  --endpoint  --allocated-storage 100 --db-instance-class db.t2.small  --db-name benchdb  --master-username SECRET  --master-user-password XXXXX  --engine postgres  --engine-version 11.20180524   --db-instance-identifier pg11beta1

Once deployed, I can always get the endpoint of the instance as follows:

$ aws rds describe-db-instances --endpoint= --db-instance-identifier pg11beta1 |grep Address
                "Address": ""

In my account I have already added my client to my default security group,

$ psql -h -U pgadmin -d benchdb -c 'SELECT VERSION()'
 PostgreSQL 11beta1 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit

It is hard to test a database without any data.  Normally I would just use pgbench directly on it and load data. But then I thought let me try a different way of loading data in this instance of the database similarly to how people will typically do from a different production setup. For this purpose, I will need to setup a production database before I proceed.

Before I  create a production database instance, I first create a custom parameter group so that  I can  enable my typical settings that I use in a production database.  In the preview environment I created a  PostgreSQL 11 database family parameter group and edit the group to change some of the parameters as follows:

rds.logical_replication = 1

and saved the group.
Next, I create my production instance using the newly created parameter group.

 $ aws rds create-db-instance --allocated-storage 100 --db-instance-class db.t2.small --engine postgres --db-name benchdb --master-username pgadmin --master-user-password SECRET --db-instance-identifier pg11prod 

It is still empty so I filled it up with my production data.

$ pgbench -i -s 100  -h -U pgadmin benchdb

Now, I have a typical setup with one production instance and another empty test instance. I know have to figure how to get the data into my test instance. I could always dump all data using pg_dump and restored it on the new instance but this time I am going to try logical replication.

For setting up logical replication between two instances I first need to recreate the schema on the other instance. pg_dump provides a flag -s to just dump the schema with no data. I dump the schema from the production setup

$ pg_dump -s  -h -U pgadmin benchdb > schema.txt

and then load the schema into my test setup

$ psql -h -U pgadmin -d benchdb -f schema.txt

Now, I want to actually setup logical replication between the two users. For this I need a replication user. I could use the master password but that is too risky. So, I create a new user with read only privileges on the tables in the database and give it replication rights that will work in Amazon RDS.

$ psql -h -U pgadmin benchdb

benchdb=> GRANT rds_replication TO repluser;

benchdb=> GRANT SELECT ON ALL TABLES IN SCHEMA public TO repluser;

Next, I have to setup a  publication for all tables in the production database


One more thing to add here is to change the inbound rules of the security group of the production instance to allow the test instance to connect.

On my test instance I need to create a subscription to subscribe to all changes happening on my production setup.

$ psql -h -U pgadmin benchdb

benchdb=> CREATE SUBSCRIPTION pg11beta1 CONNECTION ' dbname=benchdb user=repluser password= SECRET' PUBLICATION pgprod11;
NOTICE:  created replication slot "pg11beta1" on publisher

Note if the command itself is taking a long time to execute then typically it means that it cannot connect to the production instance. Check the security group to make sure the rule to allow your test instance to connect is set properly.  If the connection is allowed then the command returns instantaneously. However, the actual data might be loading behind the scenes.

After some time, I can see that my test instance has all the initial data from the production setup.

benchdb=> select count(*) from pgbench_branches;
(1 row)

benchdb=> select count(*) from pgbench_history;
(1 row)

(The table pgbench_history is typically empty after a fresh setup of pgbench)

Now let's run application workload on our production database pg11prod

$ pgbench -c 10  -T 300 -P 10  -h -U pgadmin -U pgadmin benchdb

As the load starts (after the initial vacuum), log into the test instance and check for changes. With pgbench default test, it is easy to verify changes by counting entries in pgbench_history.

$ psql -h -U pgadmin benchdb
psql (10.4 (Ubuntu 10.4-2.pgdg16.04+1), server 11beta1)
Type "help" for help.

benchdb=> select count(*) from pgbench_history;
(1 row)

benchdb=> select count(*) from pgbench_history;
(1 row)

This is a simple test to see changes are being propagated from the production instance to the test instance.

I, finally, have a logical replication using a read-only user between two PostgreSQL 11 instances in Amazon RDS Database Preview Environment.

Pretty cool!

Thursday, June 30, 2016

Hello Docker on Windows

NOTE: 9/8/2016: This is an older post which I wrote few  months ago but never not posted.

After using docker on Linux for more than a  year it was finally time to try it on a different platform. Trying on docker on Windows Server 2016 TP4 was one way to try it out but the experience of that was bit more complicated. However when I heard about docker on Windows 10 I was initially surprised. Why? Well based on what I had seen and figured out that it really needed Hyper-V features to run which I assumed was only available on the Windows Server line.

I guess I was wrong. Using Control Panel -> Program & Features -> Turn Windows Features On or Off , there is a feature called Hyper-V which can be turned on.
Now before you start searching for it and trying to turn it on wait till you read the following to save you some hassles.

1. You need Windows 10 Pro (Sorry Windows 10 does not work)
2. You need a CPU which supports Virtualization and SLAT aka EPT.

With Task Manager -> Performance -> CPU it is easy to figure out if Virtualization is supported or not. But SLAT is another story. systeminfo or coreinfo is required to figure that out. You may be able to turn on some of the components of the Hyper-V on CPUs not supporting SLAT but that will not be enough.

I really had to cycle through few laptops using Intel Core2 Duo and Intel Pentium chips which do support Virtualization but did not support SLAT and finally came across my dusty desktop using AMD Phenom which had Virtualization with SLAT support on it. and running Windows 10 on it.

Of course then I applied for the Docker beta program on Windows. The invitation came yesterday and finally got a chance to download the docker binaries and install it.

Once the installation (as Administrator of course) finished it gave the option to Launch docker and after it finished launching the daemon in the background it showed a splash image as follows:

Good job Docker on the usability to show me what to do next:

Next I deploy an nginx server as follows

Woha!! If it did not strike you.. I am running Linux images here on Windows!!
Now I can access the same in a browser as http://docker/
(This I  would say was a  bit of struggle since I had not read the doc properly where I was trying with or http://localhost or http://LOCAL/ but only http://docker worked)

Overall very interesting and game changing for development on Windows!.