Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Thread hang cause "waiting for leader to bootstrap"

See original GitHub issue

Hi,

We meet an issue during “helm install”. After analysis, I think there is a chance python thread hang and cause “waiting for leader to bootstrap”. So I would like to report this issue, I don’t know if you could do something to improve this part or not?

Patroni v1.6.3 Python 2.7

— LOG —

2020-03-31T19:07:05.066891092Z Skip service level restore action.
2020-03-31T19:07:05.093194223Z /entrypoint.sh: dir data changed for postgresql
2020-03-31T19:07:05.096706161Z /entrypoint.sh: dir /var/lib/postgresql/data/pgdata changed owner for postgresql
2020-03-31T19:07:05.117268451Z ls: cannot access '/var/lib/postgresql/data/pgdata/pg_replslot/': No such file or directory
2020-03-31T19:07:05.119851734Z /entrypoint.sh: create dir done, uid=26(postgres) gid=26(postgres) groups=26(postgres),0(root)
2020-03-31T19:07:05.685056852Z 2020-03-31 19:07:05,684 INFO: postgres connection_string is postgres://192.168.21.199:5432/postgres
2020-03-31T19:07:05.68508475Z 2020-03-31 19:07:05,684 INFO: No PostgreSQL configuration items changed, nothing to reload.
2020-03-31T19:07:05.686310355Z 2020-03-31 19:07:05,686 INFO: Selected address family is 2
2020-03-31T19:07:05.68791133Z 2020-03-31 19:07:05,687 INFO: Postgres stop: success: True, signaled: False, block_callbacks: False
2020-03-31T19:07:05.688267346Z 2020-03-31 19:07:05,687 INFO: Lock owner: None; I am testapp-db-pg-0
2020-03-31T19:07:05.688286198Z 2020-03-31 19:07:05,688 INFO: waiting for leader to bootstrap
2020-03-31T19:07:15.688363227Z 2020-03-31 19:07:15,687 INFO: Postgres stop: success: True, signaled: False, block_callbacks: False
2020-03-31T19:07:15.688460201Z 2020-03-31 19:07:15,688 INFO: Lock owner: None; I am testapp-db-pg-0

If we take a look source code, I find this ha.py

                else:
                    ret = self._async_executor.try_run_async('bootstrap', self.state_handler.bootstrap.bootstrap,
                                                             args=(self.patroni.config['bootstrap'],))
                    return ret or 'trying to bootstrap a new cluster'

async_executor.py

    def run_async(self, func, args=()):
        Thread(target=self.run, args=(func, args)).start()

    def try_run_async(self, action, func, args=()):
        prev = self.schedule(action)
        if prev is None:
            return self.run_async(func, args)
        return 'Failed to run {0}, {1} is already in progress'.format(action, prev)

As we didn’t see “trying to bootstrap a new cluster” printout, I think the python thread had some kind of run-time problem.

Do you have any suggestions?

BRs, Fan Liu

Issue Analytics

State:
Created 3 years ago
Comments:8

Top GitHub Comments

1reaction

CyberDem0ncommented, Apr 2, 2020

The only place which keeps the information about the initialized cluster is configmap or endpoint on K8s and the /config key for any other DCS. If you still get waiting for leader to bootstrap message - that means <cluster-name>-config configmap or endpoint is still there. Nothing else is possible.

0reactions

qinggueecommented, Apr 17, 2020

Thanks for the info @CyberDem0n You are right, especially on K8s. Restart just happens by many reason.

BRs, Fan Liu

Top Results From Across the Web

Thread: Patroni configuration issue - Postgres Professional

Waiting for leader to bootstrap yml -- start this when p0 is down. ideally when it is started as replica, it would...

Upgrade patroni to 2.0.x (#5870) · Issues - GitLab

Replicas are waiting for checkpoint indication via member key of the leader in DCS. The key is normally updated only once per HA...

Patroni

Changing the bootstrap section in the Patroni configuration takes no effect once the cluster has been bootstrapped. Page 52. 52. Please capita.

Consumer not receiving messages, kafka console, new ...

I my MAC box I was facing the same issue of console-consumer not consuming any messages when used the command kafka-console-consumer --bootstrap-server ...

Patroni - PGCon

2019-03-07 12:14:33,864 INFO: doing crash recovery in a single user mode ... with url: /v2/keys/service/demo/leader (Caused by.