Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Postmaster not starting

See original GitHub issue

Hello,

we’re trying to deploy Harbor on a Kubernetes Cluster using Zalando PostgresOperator. We deployed the PostgresOperator and it seems to be running fine and detects when a new cluster is to be managed.

When we deploy a Postgresql HA with spilo (image -> spilo-14:2.1-p3) and the following config:

apiVersion: acid.zalan.do/v1
kind: postgresql
metadata:
  creationTimestamp: "2022-04-11T08:39:32Z"
  generation: 1
  name: harbor-postgresql
  namespace: vanillastack-harbor
  resourceVersion: "197604124"
  uid: <redacted>
spec:
  databases:
    harbor: harbor
    notary_server: harbor
    notary_signer: harbor
    registry: harbor
  enableLogicalBackup: true
  logicalBackupSchedule: 30 */12 * * *
  numberOfInstances: 2
  postgresql:
    parameters:
      max_connections: "400"
    version: "14"
  resources:
    limits:
      cpu: 750m
      memory: 1.5Gi
    requests:
      cpu: 100m
      memory: 1Gi
  teamId: harbor
  users:
    harbor: []
    postgres:
    - superuser
    - createdb
  volume:
    size: 20Gi

it shows both pods are running but when looking at the logs we can see that it fails to bootstrap the cluster like here:

2022-04-11 08:42:52,811 WARNING: Kubernetes RBAC doesn't allow GET access to the 'kubernetes' endpoint in the 'default' namespace. Disabling 'bypass_api_service'.
2022-04-11 08:42:52,822 INFO: No PostgreSQL configuration items changed, nothing to reload.
2022-04-11 08:42:52,824 INFO: Lock owner: None; I am harbor-postgresql-0
2022-04-11 08:42:52,842 INFO: trying to bootstrap a new cluster
The files belonging to this database system will be owned by user "postgres".
This user must also own the server process.

The database cluster will be initialized with locale "en_US.utf-8".
The default database encoding has accordingly been set to "UTF8".
The default text search configuration will be set to "english".

Data page checksums are disabled.

creating directory /home/postgres/pgdata/pgroot/data ... ok
creating subdirectories ... ok
selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default time zone ... Etc/UTC
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... ok
syncing data to disk ... ok

Success. You can now start the database server using:

    /usr/lib/postgresql/14/bin/pg_ctl -D /home/postgres/pgdata/pgroot/data -l logfile start

2022-04-11 08:42:53,887 INFO: postmaster pid=222
2022-04-11 08:42:53 UTC [222]: [1-1] 6253ea0d.de 0     LOG:  Auto detecting pg_stat_kcache.linux_hz parameter...
2022-04-11 08:42:53 UTC [222]: [2-1] 6253ea0d.de 0     LOG:  pg_stat_kcache.linux_hz is set to 1000000
/var/run/postgresql:5432 - no response
/var/run/postgresql:5432 - no response
2022-04-11 08:42:55,973 ERROR: postmaster is not running
2022-04-11 08:42:55,978 INFO: removing initialize key after failed attempt to bootstrap the cluster
2022-04-11 08:42:55,985 INFO: renaming data directory to /home/postgres/pgdata/pgroot/data_2022-04-11-08-42-55
Traceback (most recent call last):
  File "/usr/local/bin/patroni", line 11, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 171, in main
    return patroni_main()
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 139, in patroni_main
    abstract_main(Patroni, schema)
  File "/usr/local/lib/python3.6/dist-packages/patroni/daemon.py", line 100, in abstract_main
    controller.run()
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 109, in run
    super(Patroni, self).run()
  File "/usr/local/lib/python3.6/dist-packages/patroni/daemon.py", line 59, in run
    self._run_cycle()
  File "/usr/local/lib/python3.6/dist-packages/patroni/__init__.py", line 112, in _run_cycle
    logger.info(self.ha.run_cycle())
  File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1471, in run_cycle
    info = self._run_cycle()
  File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1345, in _run_cycle
    return self.post_bootstrap()
  File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1238, in post_bootstrap
    self.cancel_initialization()
  File "/usr/local/lib/python3.6/dist-packages/patroni/ha.py", line 1231, in cancel_initialization
    raise PatroniFatalException('Failed to bootstrap cluster')
patroni.exceptions.PatroniFatalException: 'Failed to bootstrap cluster'
/run/service/patroni: finished with code=1 signal=0
/run/service/patroni: sleeping 120 seconds

The logs say that postmaster is not running so when trying to get the pid of postmaster it indeed is missing

    PID TTY      STAT   TIME COMMAND
      1 ?        Ss     0:00 /usr/bin/dumb-init -c --rewrite 1:0 -- /bin/sh /launch.s
      7 ?        S      0:00 /bin/sh /launch.sh
     31 ?        S      0:00 /usr/bin/runsvdir -P /etc/service
     32 ?        Ss     0:00 runsv pgqd
     33 ?        Ss     0:00 runsv patroni
     35 ?        S      0:00 /bin/bash /scripts/patroni_wait.sh --role master -- /usr
    288 pts/0    Ss     0:00 bash
    312 ?        S      0:00 sleep 60
    317 pts/0    R+     0:00 ps -ax

Do you have any idea why this might be happening? We’re running on Kubernetes 1.23.5 I also must note we have a very similar different cluster setup where everything works fine, we can’t figure out why it doesn’t deploy on this cluster though.

If you need more information please let me know.

Issue Analytics

State:
Created a year ago
Comments:20 (5 by maintainers)

Top GitHub Comments

2reactions

FactorTcommented, Jul 11, 2022

@haslersn there is a fix - https://github.com/zalando/spilo/releases/tag/2.1-p6 “Compatibility with cgroup v2 when figuring out memory limit and auto-calculating shared_buffers size.”

2reactions

LiuShuaiyicommented, Apr 22, 2022

I ran into this and what helped me solve the issue is to adjust the log level to see “DEBUG” logs. Turned out my problem was the command to start postgres attempted to use too much share_buffers memory: https://dba.stackexchange.com/questions/184951/memory-errors-on-startup-in-postgresql-9-6-log-map-hugetlb-failed

Top Results From Across the Web

failed to start postgres (reaped unknown pid : postmaster is ...

I am using the patroni helm chart. Spilo image is "registry.opensource.zalan.do/acid/spilo-10:1.4-p16" I did not try to delete ...

PostgreSQL postmaster not starting - Ask Ubuntu

The error message at postmaster failed start and the netstat output show that there's already a process that uses the 5432 TCP port....

Documentation: 8.1: Starting the Database Server - PostgreSQL

Starting the Database Server. Before anyone can access the database, you must start the database server. The database server program is called postmaster....

PostgreSQL stale 'postmaster.pid' error - Danielle McCarthy

Sometimes when your computer dies or crashes you'll come across an error with your Postgres database that mentions the postmaster.pid file. This happens...

Postgresql server is running, but service appears stopped

2016-02-23 09:59:00 CET LOG: could not bind IPv4 socket: No error 2016-02-23 09:59:00 CET HINT: Is another postmaster already running on ...