Deploying an EDB Postgres Distributed example cluster on Docker v5

This quick start uses TPA to set up PGD with an Always-on Single Location architecture using local Docker containers.

Introducing TPA and PGD

We created TPA to make installing and managing various Postgres configurations easily repeatable. TPA orchestrates creating and deploying Postgres. In this quick start, you install TPA first. If you already have TPA installed, you can skip those steps. You can use TPA to deploy various configurations of Postgres clusters.

PGD is a multi-master replicating implementation of Postgres designed for high performance and availability. The installation of PGD is orchestrated by TPA. You will use TPA to generate a configuration file for a PGD demonstration cluster.

This cluster uses local Docker containers to host the cluster's nodes: three replicating database nodes, three cohosted connection proxies, and one backup node. You can then use TPA to provision and deploy the required configuration and software to each node.

This configuration of PGD isn't suitable for production use but can be valuable for testing the functionality and behavior of PGD clusters. You might also find it useful when familiarizing yourself with PGD commands and APIs to prepare for deployment on cloud, VM, or Linux hosts.

Note

This set of steps is specifically for Ubuntu 22.04 LTS on Intel/AMD processors.

Prerequisites

To complete this example, you need a system with enough RAM and free storage. You also need curl and Docker installed.

RAM requirements

You need a minimum of 4GB of RAM on the system. You need this much RAM because you will be running four containers, three of which will be hosting Postgres databases.

Free disk space

You need at least 5GB of free storage, accessible by Docker, to deploy the cluster described by this example. We recommend that you have a bit more than that.

The curl utility

You will download and run scripts during this quick start using the curl utility, which might not be installed by default. To ensure that curl is installed, run:

sudo apt update
sudo apt install curl

Docker Engine

You will use Docker containers as the target platform for this PGD deployment. Install Docker Engine:

sudo apt update
sudo apt install docker.io
Running as a non-root user

Once Docker Engine is installed, be sure to add your user to the Docker group:

sudo usermod -aG docker <username>
newgrp docker

Preparation

EDB account

To install both TPA and PGD, you need an EDB account.

Sign up for a free EDB account if you don't already have one. Signing up gives you a trial subscription to EDB's software repositories.

After you're registered, go to the EDB Repos 2.0 page, where you can obtain your repo token.

On your first visit to this page, select Request Access to generate your repo token. Copy the token using the Copy Token icon, and store it safely.

Setting environment variables

First, set the EDB_SUBSCRIPTION_TOKEN environment variable to the value of your EDB repo token, obtained in the EDB account step.

export EDB_SUBSCRIPTION_TOKEN=<your-repo-token>

You can add this to your .bashrc script or similar shell profile to ensure it's always set.

Configure the repository

All the software needed for this example is available from the EDB Postgres Distributed package repository. The following command downloads and runs a script to configure the EDB Postgres Distributed repository. This repository also contains the TPA packages.

curl -1sLf "https://downloads.enterprisedb.com/$EDB_SUBSCRIPTION_TOKEN/postgres_distributed/setup.deb.sh" | sudo -E bash
Troubleshooting repo access

The script should produce output starting with:

Executing the  setup script for the 'enterprisedb/postgres_distributed' repository ...

If it produces no output or an error, double-check that you entered your token correctly. It the problem persists, contact Support for assistance.

Installing Trusted Postgres Architect (TPA)

You'll use TPA to provision and deploy PGD. If you previously installed TPA, you can move on to the next step. You'll find full instructions for installing TPA in the Trusted Postgres Architect documentation, which we've also included here.

Linux environment

TPA supports several distributions of Linux as a host platform. These examples are written for Ubuntu 22.04, but steps are similar for other supported platforms.

Install the TPA package

sudo apt install tpaexec

Configuring TPA

You now need to configure TPA, which configures TPA's Python environment. Call tpaexec with the command setup:

sudo /opt/EDB/TPA/bin/tpaexec setup
export PATH=$PATH:/opt/EDB/TPA/bin

You can add the export command to your shell's profile.

Testing the TPA installation

You can verify TPA is correctly installed by running selftest:

tpaexec selftest

TPA is now installed.

Installing PGD using TPA

Generating a configuration file

Run the tpaexec configure command to generate a configuration folder:

tpaexec configure democluster \
  --architecture PGD-Always-ON \
  --platform docker \
  --edb-postgres-advanced 16 \
  --redwood \
  --location-names dc1 \
  --pgd-proxy-routing local \
  --no-git \
  --hostnames-unsorted \
  --keyring-backend legacy

You specify the PGD-Always-ON architecture (--architecture PGD-Always-ON), which sets up the configuration for PGD's Always-on architectures. As part of the default architecture, it configures your cluster with three data nodes, cohosting three PGD Proxy servers, along with a Barman node for backup.

Specify that you're using Docker (--platform docker). By default, TPA configures Rocky Linux as the default image for all nodes.

Deployment platforms

Other Linux platforms are supported as deployment targets for PGD. See the EDB Postgres Distributed compatibility table for details.

Observe that you don't have to deploy PGD to the same platform you're using to run TPA!

Specify that the data nodes will be running EDB Postgres Advanced Server v16 (--edb-postgres-advanced 16) with Oracle compatibility (--redwood).

You set the notional location of the nodes to dc1 using --location-names. You then set --pgd-proxy-routing to local so that proxy routing can route traffic to all nodes in each location.

By default, TPA commits configuration changes to a Git repository. For this example, you don't need to do that, so pass the --no-git flag.

You also ask TPA to generate repeatable hostnames for the nodes by passing --hostnames-unsorted. Otherwise, it selects hostnames at random from a predefined list of suitable words.

Finally, --keyring-backend legacy tells TPA to select the legacy version of the keyring backend. Secrets are stored with an older keyring backend, as the version of Ubuntu this example is based on doesn't support the newer keyring backend.

This command creates a subdirectory called democluster in the current working directory. It contains the config.yml configuration file TPA uses to create the cluster. You can view it using:

less democluster/config.yml
Further reading

Deploying the cluster

You can now deploy the distributed cluster. For Docker deployments, deploying both provisions the required Docker containers and deploys the software to those containers:

tpaexec deploy democluster

TPA applies the configuration, installing the needed packages and setting up the actual EDB Postgres Distributed cluster.

Further reading

Connecting to the cluster

You're now ready to log in to one of the nodes of the cluster with SSH and then connect to the database. Part of the configuration process is to set up SSH logins for all the nodes, complete with keys. To use the SSH configuration, you need to be in the democluster directory created by the tpaexec configure command earlier:

cd democluster

From there, you can run ssh -F ssh_config <hostname> to establish an SSH connection. You will connect to kaboom, the first database node in the cluster:

ssh -F ssh_config kaboom
Output
[root@kaboom ~]# 

Notice that you're logged in as root on kaboom.

You now need to adopt the identity of the enterprisedb user. This user is preconfigured and authorized to connect to the cluster's nodes.

sudo -iu enterprisedb
Output
enterprisedb@kaboom:~ $

You can now run the psql command to access the bdrdb database:

psql bdrdb
Output
psql (16.2.0, server 16.2.0)
Type "help" for help.

bdrdb=#

You're directly connected to the Postgres database running on the kaboom node and can start issuing SQL commands.

To leave the SQL client, enter exit.

Using PGD CLI

The pgd utility, also known as the PGD CLI, lets you control and manage your EDB Postgres Distributed cluster. It's already installed on the node.

You can use it to check the cluster's health by running pgd cluster show --health:

pgd cluster show --health
Output
Connections
-----------
Checks if all BDR nodes are accessible.

Result: Ok, all BDR nodes are accessible


Raft
----
Raft Consensus status. Checks if all data and witness nodes are participating
in raft and have the same leader.

Result: Ok, raft Consensus is working correctly


Replication Slots
-----------------
Checks if all PGD replication slots are working correctly.

Result: Ok, all PGD replication slots are working correctly


Clock Skew
----------
Clock drift between nodes. Uses raft leader as reference node to calculate
clock drift. High clock drift can affect conflict resolution and potentially
cause inconsistency.

Result: Ok, clock drift is within permissible limit


Versions
--------
Checks if all nodes are running the same PGD version.

Result: Ok, all nodes are running the same PGD version

Or, you can use pgd nodes list to ask PGD to show you the data-bearing nodes in the cluster:

pgd nodes list
Output
Node Name Group Name   Node Kind Join State Node Status
--------- ------------ --------- ---------- -----------
kaftan    dc1_subgroup data      ACTIVE     Up         
kaolin    dc1_subgroup data      ACTIVE     Up         
kaboom    dc1_subgroup data      ACTIVE     Up 

Explore your cluster