Wednesday, July 25, 2018

Loading data in PostgreSQL 11 Beta Using Native Logical Replication from PostgreSQL 10 in Amazon RDS

In the last blog entry,  I talked about creating two instances of PostgreSQL 11 Beta in Amazon RDS Database Preview Environment and setting up native logical replication. Today, Amazon RDS announced support for PostgreSQL 10.4 with native logical replication.  Let's see how to use this new support to replicate data from PostgreSQL 10 in Amazon RDS into PostgreSQL 11 Beta instances in  preview environment.

I  start with a new PostgreSQL 10.4 instance in Amazon RDS and populated it with data from an older dataset of IMDB.

benchdb-> \d
                          List of relations
 Schema |                 Name                  |   Type   |  Owner
--------+---------------------------------------+----------+---------
 public | acted_in                              | table    | pgadmin
 public | acted_in_idacted_in_seq               | sequence | pgadmin
 public | actors                                | table    | pgadmin
 public | aka_names                             | table    | pgadmin
 public | aka_names_idaka_names_seq             | sequence | pgadmin
 public | aka_titles                            | table    | pgadmin
 public | aka_titles_idaka_titles_seq           | sequence | pgadmin
 public | genres                                | table    | pgadmin
 public | keywords                              | table    | pgadmin
 public | movies                                | table    | pgadmin
 public | movies_genres                         | table    | pgadmin
 public | movies_genres_idmovies_genres_seq     | sequence | pgadmin
 public | movies_keywords                       | table    | pgadmin
 public | movies_keywords_idmovies_keywords_seq | sequence | pgadmin
 public | series                                | table    | pgadmin
(15 rows)

This "production" PostgreSQL 10 database also has data in it.

benchdb=> select count(*) from acted_in;
 count
--------
 618706
(1 row)

benchdb=> select count(*) from movies;
 count
--------
 183510
(1 row)

benchdb=> select count(*) from series;
 count
--------
 162498
(1 row)


In order to prepare PostgreSQL 10 instance in Amazon RDS for logical replication, we need to verify that rds.logical_replication database parameter is enabled. If it is not enabled, you can create a customer parameter group with rds.logical_replication enabled and the parameter group assigned to the database instance. In my case I had already used a custom parameter group with logical replication enabled.

benchdb=> show rds.logical_replication;
 rds.logical_replication
-------------------------
 on
(1 row)

In order to use logical replication,  a replication user to be created  in PostgreSQL 10 instance that will be used to connect from PostgreSQL 11 instance. In case of Amazon RDS, that can be done by granting the rds_replication role to the user.

benchdb=> CREATE USER pg11repluser WITH password 'SECRET';
CREATE ROLE
benchdb=> GRANT rds_replication TO pg11repluser;
GRANT ROLE

For security purpose, it is better that the replication user only has SELECT permissions on the tables to be replicated.

benchdb=> GRANT SELECT ON ALL TABLES IN SCHEMA public TO pg11repluser;
GRANT

The final step inside the database is to create  a publication pgprod10 to indicate which tables need to be replicated. An easier way to select all tables is as follows:

benchdb=> CREATE PUBLICATION pgprod10 FOR ALL TABLES;
CREATE PUBLICATION


One thing to note here is to edit  the inbound rules of the security group of the production instance to allow the PostgreSQL 11 Beta instance to connect.

On PostgreSQL 11 Beta instance, first thing is to recreate schema. We use pg_dump for this purpose only to copy the schema over from PostgreSQL 10 instance

$ pg_dump -s  -h pg104.XXXXX.us-east-2.rds.amazonaws.com -U pgadmin benchdb > movies_schema.sql


Load the schema into PostgreSQL 11 using the psql client

$ psql -h pg11from10.XXXXX.us-east-2.rds-preview.amazonaws.com -U pgadmin -d benchdb -f movies_schema.sql

Note you might see errors for GRANT statements if the same users are not defined in the new instance. It is okay to ignore these messages.

ERROR:  role "pg11repluser" does not exist

We are now ready to create the subscription on PostgreSQL 11 Beta. We verify that there are no rows in this case and then confirm that we get all expected rows after the subscription is created.

benchdb=> select count(*) from acted_in;
 count
-------
     0
(1 row)

benchdb=> select count(*) from movies;
 count
-------
     0
(1 row)

benchdb=> select count(*) from series;
 count
-------
     0
(1 row)

benchdb=> CREATE SUBSCRIPTION pg11beta1 CONNECTION 'host=pg104.XXXXX.us-east-2.rds.amazonaws.com dbname=benchdb user=pg11repluser password=SECRET' PUBLICATION pgprod10;
NOTICE:  created replication slot "pg11beta1" on publisher
CREATE SUBSCRIPTION
benchdb=> select count(*) from acted_in;
 count
--------
 618706
(1 row)

benchdb=> select count(*) from movies;
 count
--------
 183510
(1 row)

benchdb=> select count(*) from series;
 count
--------
 162498
(1 row)


benchdb=>

With the new native logical replication support in PostgreSQL 10 in Amazon RDS, it is now easy to replicate the data into PostgreSQL 11 Beta instance in Amazon RDS Database Preview Environment.  It can also be used to replicate data to/from database instances deployed outside of Amazon RDS.

This blog represents my own view points and not of my employer, Amazon Web Services.


2 comments:

MaksimMB said...
This comment has been removed by the author.
rajeswari said...

Hi, Great Article

Live GoLogica Generative AI training
https://www.gologica.com/master-program/generative-ai-training/