Full production example: https://github.com/arianitu/postgres-statefulset
We’ve been using Helm charts for Postgres, but recently the maintainers decided to break backwards compatibility and I decided to create my own manifests for a Postgres cluster.
Also, I’ve ran into issues with Helm charts where adding extra functionality was a bit of a pain. Besides, writing manifests in Kubernetes is quite enjoyable and pretty easy. It also makes you more aware of your cluster setup!
The requirements are:
-
- Postgres master server
-
- Postgres replica server that gets created based off the master server.
-
- Use Kubernetes StatefulSet’s
-
- Use the official Postgres image
I’ve decided to create two StatefulSets, one for the Master server and one for the Replication server. You may be able to create something out of only one StatefulSet where the Master and Slaves automatically discover what they are, but this system seemed more complicated.
Having two StatefulSets allows you to spin N master servers and N replica servers.
Configuration
Our configuration is based on the official documentation of setting up streaming replication on Postgres. You can find the documentation here: https://wiki.postgresql.org/wiki/Streaming_Replication
The steps are:
-
- Create a replica user that is used by the replica server to get data from the master
-
- Configure the Master server
-
- Configure the Replica server
-
- Allow access to the Master server from the Replica (via hba_conf)
Configuration files will be stored in a Kubernetes ConfigMap, while any database credentials will be stored in a Secret.
Create Replica User
So starting with the replica user, I decided to use an image feature from that scans for SQL files under /docker-entrypoint-initdb.d
.
ConfigMap (create-replica-user.sh)
#!/bin/bash
set -e
psql -v ON_ERROR_STOP=1 --username "$POSTGRES_USER" --dbname "$POSTGRES_DB" <<-EOSQL
CREATE ROLE replication WITH REPLICATION PASSWORD '$REPLICATION_PASSWORD' LOGIN
EOSQL
This file is mounted to /docker-entrypoint-initdb.d/create-replica-user.sh
in our Master StatefulSet.
Configuring Postgres.conf
Both the Master StatefulSet and Replica StatefulSet will be using a shared Postgres.conf. You should copy over your Postgres.conf (an example can be found here) and add this to the end of it:
ConfigMap (postgres.conf)
include_if_exists = 'master.conf'
include_if_exists = 'replica.conf'
Configure the Master server.
ConfigMap (master.conf):
# To enable read-only queries on a standby server, wal_level must be set to
# "hot_standby". But you can choose "archive" if you never connect to the
# server in standby mode.
wal_level = hot_standby
# Set the maximum number of concurrent connections from the standby servers.
max_wal_senders = 5
# To prevent the primary server from removing the WAL segments required for
# the standby server before shipping them, set the minimum number of segments
# retained in the pg_xlog directory. At least wal_keep_segments should be
# larger than the number of segments generated between the beginning of
# online-backup and the startup of streaming replication. If you enable WAL
# archiving to an archive directory accessible from the standby, this may
# not be necessary.
wal_keep_segments = 32
# Enable WAL archiving on the primary to an archive directory accessible from
# the standby. If wal_keep_segments is a high enough number to retain the WAL
# segments required for the standby server, this is not necessary.
#archive_mode = on
#archive_command = 'cp %p /path_to/archive/%f'
Configue the Replica server
Configuration of the Replica server is very straightforward. We need to set hot_standby = on and we’re done.
ConfigMap (replica.conf)
hot_standby = on
Configure pg_hba.conf to allow Replica to connect to Master
ConfigMap (pg_hba.conf)
# TYPE DATABASE USER ADDRESS METHOD
# "local" is for Unix domain socket connections only
local all all trust
# IPv4 local connections:
host all all 127.0.0.1/32 trust
# IPv6 local connections:
host all all ::1/128 trust
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all trust
host replication all 127.0.0.1/32 trust
host replication all ::1/128 trust
host replication replication all md5
host all all all md5
This line allows the replication user to connect and start replicating:
host replication replication all md5
Creating all above files in a ConfigMap
It’s up to you how you create your ConfigMap, but I usually put all the files in a config folder and create a small script that creates a ConfigMap based on the files in that folder.
The script looks like this:
create_configmap.sh:
kubectl create configmap postgres --from-file=postgres.conf --from-file=master.conf --from-file=replica.conf --from-file=pg_hba.conf --from-file=create-replica-user.sh
I then run the script ./create_configmap.sh
and we have all the configuration ready to go!
Configuring database user credentials.
In Kubernetes, sensitive information is put in a Secret. This is where you put your Postgres user/password anslo the replication users password. The config looks like this:
secret.yaml:
apiVersion: v1
kind: Secret
metadata:
name: postgres
type: Opaque
stringData:
password: master-password
replicaPassword: replica-password
Apply it:
kubectl apply -f secret.yml
Master StatefulSet
We now need to create a Master StatefulSet. This StatefulSet is pretty standard, we’re using the Postgres:10.5 image and we’re volume mounting all the configuration we created above.
You can also configure how big you want the database to be and also how much resources you want the database to take in the cluster.
Look for resources
and volumeClaimTemplates
to configure those. This appears long and complicated, but it’s actually very simple.
The livenessProbe and readinessProbe determine if the database is running properly, and if it’s not, it restarts the database.
That’s all there is to it really!
statefulset-master.yml:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: postgres
serviceName: postgres
replicas: 1
template:
metadata:
labels:
app: postgres
spec:
volumes:
- name: postgres-config
configMap:
name: postgres
terminationGracePeriodSeconds: 10
containers:
- name: postgres
image: postgres:10.5
args: ['-c', 'config_file=/etc/postgres.conf', '-c', 'hba_file=/etc/pg_hba.conf']
imagePullPolicy: IfNotPresent
ports:
- name: postgres
containerPort: 5432
protocol: TCP
resources:
requests:
cpu: 100m
memory: 256Mi
env:
- name: POSTGRES_USER
value: postgres
- name: PGUSER
value: postgres
- name: POSTGRES_DB
value: postgres
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: postgres
- name: REPLICATION_PASSWORD
valueFrom:
secretKeyRef:
key: replicaPassword
name: postgres
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
livenessProbe:
exec:
command:
- sh
- -c
- exec pg_isready --host $POD_IP
failureThreshold: 6
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
exec:
command:
- sh
- -c
- exec pg_isready --host $POD_IP
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
volumeMounts:
- mountPath: /var/lib/postgresql/data/pgdata
name: postgres
subPath: postgres-db
- name: postgres-config
mountPath: /etc/postgres.conf
subPath: postgres.conf
- name: postgres-config
mountPath: /etc/master.conf
subPath: master.conf
- name: postgres-config
mountPath: /etc/pg_hba.conf
subPath: pg_hba.conf
- name: postgres-config
mountPath: /docker-entrypoint-initdb.d/create-replica-user.sh
subPath: create-replica-user.sh
volumeClaimTemplates:
- metadata:
name: postgres
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "standard"
resources:
requests:
storage: 1Gi
Apply the StatefulSet to the cluster:
kubectl apply -f statefulset-master.yml
Wait for the Master server before running the Live server.
Replica StatefulSet
This file is very similar to the above file, with some key differences. We volume mount the replica config instead, we don’t volume mount pg_hba.conf and we also don’t mount create-replica-user.sh.
We also have an initContainer that clones the masters data using pg_basebackup
statefulset-replica.yml:
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: postgres-replica
spec:
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: postgres-replica
serviceName: postgres-replica
replicas: 1
template:
metadata:
labels:
app: postgres-replica
spec:
volumes:
- name: postgres-config
configMap:
name: postgres
terminationGracePeriodSeconds: 10
initContainers:
- name: setup-replica-data-directory
image: postgres:10.5
env:
- name: PGPASSWORD
valueFrom:
secretKeyRef:
key: replicaPassword
name: postgres
command:
- sh
- -c
- |
if [ -z "$(ls -A /var/lib/postgresql/data/pgdata)" ]; then
echo "Running pg_basebackup to catch up replication server...";
pg_basebackup -R -h postgres -D /var/lib/postgresql/data/pgdata -P -U replication;
chown -R postgres:postgres $PGDATA;
else
echo "Skipping pg_basebackup because directory is not empty";
fi
volumeMounts:
- mountPath: /var/lib/postgresql/data/pgdata
name: postgres-replica
subPath: postgres-db
containers:
- name: postgres-replica
image: postgres:10.5
args: ['-c', 'config_file=/etc/postgres.conf']
imagePullPolicy: IfNotPresent
ports:
- name: postgres-rep
containerPort: 5432
protocol: TCP
resources:
requests:
cpu: 100m
memory: 256Mi
env:
- name: POSTGRES_USER
value: postgres
- name: PGUSER
value: postgres
- name: POSTGRES_DB
value: postgres
- name: PGDATA
value: /var/lib/postgresql/data/pgdata
- name: POSTGRES_PASSWORD
valueFrom:
secretKeyRef:
key: password
name: postgres
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
livenessProbe:
exec:
command:
- sh
- -c
- exec pg_isready --host $POD_IP
failureThreshold: 6
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
readinessProbe:
exec:
command:
- sh
- -c
- exec pg_isready --host $POD_IP
failureThreshold: 3
initialDelaySeconds: 5
periodSeconds: 5
successThreshold: 1
timeoutSeconds: 3
volumeMounts:
- mountPath: /var/lib/postgresql/data/pgdata
name: postgres-replica
subPath: postgres-db
- name: postgres-config
mountPath: /etc/postgres.conf
subPath: postgres.conf
- name: postgres-config
mountPath: /etc/replica.conf
subPath: replica.conf
volumeClaimTemplates:
- metadata:
name: postgres-replica
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: "ssd"
resources:
requests:
storage: 1Gi
Apply the StatefulSet to the cluster:
kubectl apply -f statefulset-replica.yml
Getting your data in
Getting your data into the cluster is quite easy using kubectl cp
. What you need to do is:
-
kubectl cp dump.sql postgres-0:dump.sql
-
psql < dump.sql
Now you can connect to the replica and verify it’s replicating all your data:
-
kubectl exec -it postgres-replica-0 /bin/bash
-
psql
-
\dt
Notes
If you’re importing a database from pg_dumpall
, you must make sure that the replication user exists and that replication users password is set in the secret.yml
file above.
This is because the replication server uses that user to clone the data and the credentials must match.