Thursday, June 30, 2016

Hello Docker on Windows

NOTE: 9/8/2016: This is an older post which I wrote few  months ago but never not posted.

After using docker on Linux for more than a  year it was finally time to try it on a different platform. Trying on docker on Windows Server 2016 TP4 was one way to try it out but the experience of that was bit more complicated. However when I heard about docker on Windows 10 I was initially surprised. Why? Well based on what I had seen and figured out that it really needed Hyper-V features to run which I assumed was only available on the Windows Server line.

I guess I was wrong. Using Control Panel -> Program & Features -> Turn Windows Features On or Off , there is a feature called Hyper-V which can be turned on.
Now before you start searching for it and trying to turn it on wait till you read the following to save you some hassles.

1. You need Windows 10 Pro (Sorry Windows 10 does not work)
2. You need a CPU which supports Virtualization and SLAT aka EPT.

With Task Manager -> Performance -> CPU it is easy to figure out if Virtualization is supported or not. But SLAT is another story. systeminfo or coreinfo is required to figure that out. You may be able to turn on some of the components of the Hyper-V on CPUs not supporting SLAT but that will not be enough.



I really had to cycle through few laptops using Intel Core2 Duo and Intel Pentium chips which do support Virtualization but did not support SLAT and finally came across my dusty desktop using AMD Phenom which had Virtualization with SLAT support on it. and running Windows 10 on it.

Of course then I applied for the Docker beta program on Windows. The invitation came yesterday and finally got a chance to download the docker binaries and install it.

Once the installation (as Administrator of course) finished it gave the option to Launch docker and after it finished launching the daemon in the background it showed a splash image as follows:


Good job Docker on the usability to show me what to do next:

Next I deploy an nginx server as follows



Woha!! If it did not strike you.. I am running Linux images here on Windows!!
Now I can access the same in a browser as http://docker/
(This I  would say was a  bit of struggle since I had not read the doc properly where I was trying with http://127.0.0.1/ or http://localhost or http://LOCAL/ but only http://docker worked)




Overall very interesting and game changing for development on Windows!.

Monday, May 16, 2016

Yet another reboot of Application Architecture

Last week I attended Redis Conf 2016 in Mission Bay Conference Center and was excited to see more than 650+ attendees discussing Redis. It is interesting that Redis has grown from a pure caching solution to now support more data use cases of their customer base.

If we put the above in perspective we will see how applications are changing over the years.

CHUI Era

Years before leading to Y2K were all monolithic applications where everything was done on a single setup with people either using dump terminals or using Windows or Unix Clients to just open a telnet sessions and use a text-based interface which was often called later as "ChUI" - Character User Interface. Browsers were not popular but Windows was picking up and some "modern" applications at that point had their first Windows Fat Client but it was still all in one Windows "GUI" applications being developed.

GUI Era

While technically the whole decade leading to year 2000, Client-Server technologies became more popular with a centralized database and a front end in either a Windows Rich Client or Java Rich Client or Web Browser based "WebUI" front end. Companies like Oracle, Sun at that time made a killing selling large Centralized servers running databases with essentially a Rich Client or WebUI client accessing the central server. In the later part of the years three tier systems. However majority of the enterprise applications were still "Rich Clients"

Java Era

The era of the middleware was basically rule by Java webapp servers leading to a "classic" three-tier systems: Database layer, Middleware layer, Presentation layer. This is the generation that heavily started the SOA leading to APIs everywhere. Also this is the generation that lead to XML hell where everybody had to understand XML to interconnect everything. However things were still monolithic specially in the Database layer and to lesser extent on the Middleware layer. Scale was more limited to Amdhal's law. To work around some of this scaling issues, more tiers were being introduced like "Caching layer", "Load Proxies", etc.

ScaleOut Era

As databases became hard to scale on a single node, Designs started changing using new kinds of database systems to support smaller boxes leading to a new kind of designs: Sharding, Shared-Nothing, Shared Data based database systems. This was the first reboot in some sense where "Eventual Consistency" paradigms before more popular and applications where now being developed with these new paradigms of multi-node databases.  Applications now had to introduce new layers who has knowledge about the "intelligence" of the scale out databases on how to handle sharding, node reconnections, etc. CAP Theorem was more discussed than Amdhal's law. The number of tiers in such a scale out application was already approaching 10 such distinct operation tiers. There were people doing multi data centers but those were primarily for DR use cases.

Cloud Era

With the advent of Amazon Web Services, new refactoring of applications started primarily with the concept of multiple data centers, variable latencies between services and needing real decoupling between tiers. Earlier the tiers were more of "components" rather than services as the assumption was everything will be updated together. Also the notion of "Change management" started changing to be more continuous deployment to production.  Applications get started to get more complex as there were some services which were "always" production mode as they were being served from 3rd Party providers. Third party API consumption really became very popular.  This really started moving the number of tiers from somewhere around 10  to more like 25-30 different tiers in an app.

MicroServices Era

With the advent of Linux containers like Docker and microservice adoption, yet another reboot of applications is happening and this time at a faster pace than before.  This is an interesting on-going era for applications. No longer a tier is a "component" of an application but it is more of a purpose driven "service" by itself.  Every service is versioned, API -accessible, fully updatable on its own without impacting the rest of the application.  This change is causing the number of tiers in a typical enterprise application to be now growing beyond 100s. I have heard some enterprises having about 300-400 microservices based tiers in thier application. Many of these microservices are 3rd party services.  There are advantages like there is no single monolithic "waterfall" release of the application anymore. Things that previously had taken months or years to build, can now be build in hours or days. But on the downside there are just too many moving parts in the application now. Architectural changes of your data flows and use cases are now very expensive. Pre-deployment testing becomes difficult, Canary deployments becomes necessary to avoid risks of introducing bugs and taking down the whole application. While nothing is bad in evolution, it is just that thinking of how to manage applications will have to change based on the changing landscape.


In conclusion, applications have changed over the years, adapting the changes is necessary for business to catch up to competition and still retain their technology edge in the market.


Wednesday, January 06, 2016

PostgreSQL and Linux Containers: #SouthbayPUG Presentation

It was a great to talk about Linux Containers tonight at Southbay PostgreSQL User Group at Pivotal.
The slides are now posted online:



Wednesday, September 30, 2015

Mirror Mirror on the wall, Where's the data? In my Vol

When I first started working with docker last year, there was a clear pattern already out there the docker image itself only consists of application binary (and depending on the philosophy - the entire OS libraries that are required) and all application data goes in a volume.

Also the concept  called "Data Container" also seemed to be little popular at that time.  Not everyone bought into that philosophy and there were various other patterns emerging out then on how people used volumes with their docker containers.

One of the emerging pattern was (or still is)  "Data Initialization if it does not exist" during container startup.
Let's face it, when we first start a docker container consisting of say PostgreSQL 9.4 database the volume is an empty file system. We then do an initdb and setup a database so that it is ready to serve.
The simplest way is to  check if the data directory has data in it and if it does not have data, then run initdb and setup the most common best practices of the database and serve it up.

Where's the simplest place to do this? In the entrypoint script of docker container of course.

I did the same mistake in my jkshah/postgres:9.4 image too. In fact I still see that same pattern in the official postgres docker image also where it looks for PG_VERSION and if it does not exists then it runs initdb.

if [ ! -s "$PGDATA/PG_VERSION" ]; then
    gosu postgres initdb

    ...
fi

This certainly has advantages:
1. Very simple to code the script.
2. Great Out of the box experience - You start the container up - the container sets itself up and it is ready to use.

Lets look what happens next in real life enterprise usages.

We got in scenarios while the applications using such databases are running but they lost all data in it. Hmm what's going wrong here? The application is working fine, the database is working fine, but all data is like it was freshly deployed and not something that was running well for 3-5 months.

Let's look at various activities that an enterprise will typically do with such a data volume - file system on the host where PostgreSQL containers are running.
1. The host location of the volume itself will be a mounted file system coming off  SAN or some storage device.
2. Enterprise will be backing up that file system on periodic intervals
3. On some cases they will be restoring that file system when required.
4. Sometimes the backend storage may have hiccups. (No ! That does not happen :-) )

In any of the above cases, where a mount fails or mounts a wrong file system or if the restore fails, you could end up with an empty file system for a volume path.  (Not all people had checks for this)

Now when you start the PostgreSQL docker container on such a volume you will get a new  database fully initialized. Most current automations that I have seen works such that in those cases even the application will fully initialize the database with its own schema and initial data and the application moves on like nothing is wrong here.

In the above case it might seem that the application is working to all probes till a customer tries to login into the setup  and find that they do not exist in the system .

For DBAs the anal rule is "No Data" error is better than "Wrong/Lost Data" serviced out of a database (specially PostgreSQL users). For this reason, this particular pattern of database initialization is becoming an ANTI Pattern in my view specially for docker containers. A better approach is to have an entrypoint command specifically to do a setup(initialization) knowingly and then all subsequent starts should be called with another entrypoint command to specifically fail if it does not find the data.

Of course again this is a philosophical view on how it should be handled. I would love to hear what people have to say about this.