question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Kubernetes dcs read timed out

See original GitHub issue

Hi I’m trying to initialize a cluster with kubernetes as dcs for patroni but I get this error:

➜  patroni kubectl logs patroni-1           
decompressing spilo image...
2018-01-24 06:54:03,626 - bootstrapping - INFO - Figuring out my environment (Google? AWS? Openstack? Local?)
2018-01-24 06:54:03,631 - bootstrapping - DEBUG - Starting new HTTP connection (1): 169.254.169.254
2018-01-24 06:54:05,636 - bootstrapping - INFO - Could not connect to 169.254.169.254, assuming local Docker setup
2018-01-24 06:54:05,637 - bootstrapping - INFO - No meta-data available for this provider
2018-01-24 06:54:05,637 - bootstrapping - INFO - Looks like your running local
2018-01-24 06:54:05,670 - bootstrapping - INFO - Configuring pgbouncer
2018-01-24 06:54:05,670 - bootstrapping - INFO - No PGBOUNCER_CONFIGURATION was specified, skipping
2018-01-24 06:54:05,670 - bootstrapping - INFO - Configuring patroni
2018-01-24 06:54:05,684 - bootstrapping - INFO - Writing to file /home/postgres/postgres.yml
2018-01-24 06:54:05,684 - bootstrapping - INFO - Configuring bootstrap
2018-01-24 06:54:05,685 - bootstrapping - INFO - Configuring certificate
2018-01-24 06:54:05,685 - bootstrapping - INFO - Generating ssl certificate
2018-01-24 06:54:05,884 - bootstrapping - DEBUG - b"Generating a 2048 bit RSA private key\n............+++\n......................................................+++\nwriting new private key to '/home/postgres/server.key'\n-----\n"
2018-01-24 06:54:05,884 - bootstrapping - INFO - Configuring crontab
2018-01-24 06:54:05,885 - bootstrapping - INFO - Configuring wal-e
2018-01-24 06:54:05,885 - bootstrapping - INFO - Configuring pam-oauth2
2018-01-24 06:54:05,885 - bootstrapping - INFO - No PAM_OAUTH2 configuration was specified, skipping
2018-01-24 06:54:05,888 - bootstrapping - INFO - Configuring patronictl
2018-01-24 06:54:06,650 CRIT Supervisor is running as root.  Privileges were not dropped because no user is specified in the config file.  If you intend to run as root, you can set user=root in the config file to avoid this message.
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/cron.conf" during parsing
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/patroni.conf" during parsing
2018-01-24 06:54:06,651 INFO Included extra file "/etc/supervisor/conf.d/pgq.conf" during parsing
2018-01-24 06:54:06,663 INFO RPC interface 'supervisor' initialized
2018-01-24 06:54:06,663 CRIT Server 'unix_http_server' running without any HTTP authentication checking
2018-01-24 06:54:06,663 INFO supervisord started with pid 1
2018-01-24 06:54:07,669 INFO spawned: 'cron' with pid 24
2018-01-24 06:54:07,671 INFO spawned: 'patroni' with pid 25
2018-01-24 06:54:07,674 INFO spawned: 'pgq' with pid 26
2018-01-24 06:54:08,676 INFO success: cron entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:08,676 INFO success: patroni entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:08,677 INFO success: pgq entered RUNNING state, process has stayed up for > than 1 seconds (startsecs)
2018-01-24 06:54:12,576 WARNING Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='10.233.0.1', port=443): Read timed out. (read timeout=3
.3333333333333335)",)': /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
2018-01-24 06:54:12,576 WARNING: Retrying (Retry(total=2, connect=None, read=None, redirect=None, status=None)) after connection broken by 'ReadTimeoutError("HTTPSConnectionPool(host='10.233.0.1', port=443): Read timed out. (read timeout=
3.3333333333333335)",)': /api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
2018-01-24 06:54:13,975 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:13,981 INFO: failed to acquire initialize lock
2018-01-24 06:54:25,519 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:25,525 INFO: failed to acquire initialize lock
2018-01-24 06:54:33,572 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:33,579 INFO: failed to acquire initialize lock
2018-01-24 06:54:43,672 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:43,679 INFO: failed to acquire initialize lock
2018-01-24 06:54:53,741 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:54:53,751 INFO: failed to acquire initialize lock
2018-01-24 06:55:03,798 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:55:03,807 INFO: failed to acquire initialize lock
2018-01-24 06:55:13,848 INFO: Lock owner: None; I am patroni-1
2018-01-24 06:55:13,856 INFO: failed to acquire initialize lock
...

But when I shell into pod I don’t see any issue with API server:

root@patroni-1:/home/postgres# KUBE_TOKEN=$(</var/run/secrets/kubernetes.io/serviceaccount/token)
root@patroni-1:/home/postgres# curl -sSk -H "Authorization: Bearer $KUBE_TOKEN" https://10.233.0.1:443/api/v1/namespaces/default/endpoints?labelSelector=application%3Dpatroni%2Capp%3Dpatroni%2Crelease%3Dpatroni%2Ccluster%3Dpatroni
{
  "kind": "EndpointsList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/namespaces/default/endpoints",
    "resourceVersion": "10343553"
  },
  "items": [
    {
      "metadata": {
        "name": "patroni",
        "namespace": "default",
        "selfLink": "/api/v1/namespaces/default/endpoints/patroni",
        "uid": "5cf882de-ff82-11e7-9b4b-005056bb262b",
        "resourceVersion": "9831960",
        "creationTimestamp": "2018-01-22T14:41:42Z",
        "labels": {
          "app": "patroni",
          "application": "patroni",
          "cluster": "patroni",
          "release": "patroni"
        },
        "annotations": {
          "kubectl.kubernetes.io/last-applied-configuration": "{\"apiVersion\":\"v1\",\"kind\":\"Endpoints\",\"metadata\":{\"annotations\":{},\"labels\":{\"app\":\"patroni\",\"application\":\"patroni\",\"cluster\":\"patroni\",\"release\":\"patroni\"},\"name\":\"patroni\",\"namespace\":\"default\"},\"subsets\":[]}\n"
        }
      },
      "subsets": null
    }
  ]
}

I tried some other commands:

root@patroni-1:/home/postgres# patronictl -c postgres.yml reinit patroni
+---------+-----------+------+------+-------+-----------+
| Cluster | Member    | Host | Role | State | Lag in MB |
+---------+-----------+------+------+-------+-----------+
| patroni | patroni-0 | None |      |       |   unknown |
| patroni | patroni-1 | None |      |       |   unknown |
| patroni | patroni-2 | None |      |       |   unknown |
+---------+-----------+------+------+-------+-----------+
Which member do you want to reinitialize [patroni-2, patroni-0, patroni-1]? []: patroni-1
Are you sure you want to reinitialize members patroni-1? [y/N]: y
Traceback (most recent call last):
  File "/usr/local/bin/patronictl", line 11, in <module>
    sys.exit(ctl())
  File "/usr/lib/python3/dist-packages/click/core.py", line 716, in __call__
    return self.main(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/core.py", line 696, in main
    rv = self.invoke(ctx)
  File "/usr/lib/python3/dist-packages/click/core.py", line 1060, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/lib/python3/dist-packages/click/core.py", line 889, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/lib/python3/dist-packages/click/core.py", line 534, in invoke
    return callback(*args, **kwargs)
  File "/usr/lib/python3/dist-packages/click/decorators.py", line 27, in new_func
    return f(get_current_context().obj, *args, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/patroni/ctl.py", line 530, in reinit
    r = request_patroni(member, 'post', 'reinitialize', body, auth_header(obj))
  File "/usr/local/lib/python3.5/dist-packages/patroni/ctl.py", line 141, in request_patroni
    data=json.dumps(content) if content else None, timeout=60)
  File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 112, in post
    return request('post', url, data=data, json=json, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/api.py", line 58, in request
    return session.request(method=method, url=url, **kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 508, in request
    resp = self.send(prep, **send_kwargs)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 612, in send
    adapter = self.get_adapter(url=request.url)
  File "/usr/local/lib/python3.5/dist-packages/requests/sessions.py", line 703, in get_adapter
    raise InvalidSchema("No connection adapters were found for '%s'" % url)
requests.exceptions.InvalidSchema: No connection adapters were found for 'b''://b''/reinitialize'

No luck with them either.

Here is my adopted manifest from https://github.com/unguiculus/charts/tree/feature/patroni/incubator/patroni:

apiVersion: apps/v1beta1
kind: StatefulSet
metadata:
  name: patroni 
  labels:
    app: patroni
    release: patroni 
    application: patroni
    cluster: patroni
spec:
  serviceName: patroni
  replicas: 3
  template:
    metadata:
      labels:
        app: patroni
        release: patroni
        application: patroni
        cluster: patroni
    spec:
      serviceAccountName: patroni-serviceaccount
      containers:
        - name: spilo
          image: registry.opensource.zalan.do/acid/spilo-10:latest
          imagePullPolicy: Always
          env:
            - name: DEBUG
              value: "true"
            - name: POD_IP
              valueFrom:
                fieldRef:
                  fieldPath: status.podIP
            - name: POD_NAMESPACE
              valueFrom:
                fieldRef:
                  apiVersion: v1
                  fieldPath: metadata.namespace
            - name: DCS_ENABLE_KUBERNETES_API
              value: "true"
            - name: USE_ENDPOINTS
              value: "true"
            - name: PATRONI_KUBERNETES_USE_ENDPOINTS 
              value: "true"
            - name: PATRONI_USE_KUBERNETES
              value: "true"
            - name: PATRONI_KUBERNETES_NAMESPACE
              valueFrom:
                fieldRef:
                  fieldPath: metadata.namespace
            - name: PATRONI_CONFIGURATION
              value: |
                postgresql:
                  bin_dir: /usr/lib/postgresql/10/bin
                kubernetes:
                  labels:
                    app: patroni
                    release: patroni
                    application: patroni
                    cluster: patroni
                  scope_label: cluster
            - name: SCOPE
              value: patroni
            - name: PGPASSWORD_SUPERUSER
              valueFrom:
                secretKeyRef:
                  name: patroni
                  key: password-superuser
            - name: PGPASSWORD_STANDBY
              valueFrom:
                secretKeyRef:
                  name: patroni
                  key: password-standby
            - name: PGROOT
              value: /home/postgres/pgdata
          ports:
            - containerPort: 8008
              name: patroni
              protocol: TCP
            - containerPort: 5432
              name: postgresql
              protocol: TCP
          volumeMounts:
            - name: pg-vol
              mountPath: /home/postgres/pgdata
            - mountPath: /etc/patroni
              name: patroni-config
              readOnly: true
      affinity:
        podAntiAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            - topologyKey: "kubernetes.io/hostname"
              labelSelector:
                matchLabels:
                  app: patroni
                  release: patroni
      volumes:
        - name: patroni-config
          secret:
            secretName: patroni
        - name: pg-vol
          hostPath:
            path: /pintapin/data/postgres
            type: Directory

There was some issues with RBAC and I used following role to overcome them:

apiVersion: rbac.authorization.k8s.io/v1beta1
kind: Role
metadata:
  name: patroni-role
  namespace: default
rules:
  - apiGroups:
      - ""
    resources:
      - configmaps
      - pods
      - secrets
      - namespaces
    verbs:
      - get
      - list
  - apiGroups:
      - ""
    resources:
      - configmaps
    verbs:
      - create
  - apiGroups:
      - ""
    resources:
      - endpoints
    verbs:
      - get
      - list
      - watch

And generated patroni config:

root@patroni-1:/home/postgres# cat postgres.yml 
bootstrap:
  dcs:
    loop_wait: 10
    maximum_lag_on_failover: 33554432
    postgresql:
      parameters:
        archive_mode: 'on'
        archive_timeout: 1800s
        autovacuum_analyze_scale_factor: 0.02
        autovacuum_max_workers: 5
        autovacuum_vacuum_scale_factor: 0.05
        checkpoint_completion_target: 0.9
        hot_standby: 'on'
        log_autovacuum_min_duration: 0
        log_checkpoints: 'on'
        log_connections: 'on'
        log_disconnections: 'on'
        log_line_prefix: '%t [%p]: [%l-1] %c %x %d %u %a %h '
        log_lock_waits: 'on'
        log_min_duration_statement: 500
        log_statement: ddl
        log_temp_files: 0
        max_connections: 266
        max_replication_slots: 5
        max_wal_senders: 5
        tcp_keepalives_idle: 900
        tcp_keepalives_interval: 100
        track_functions: all
        wal_keep_segments: 8
        wal_level: hot_standby
        wal_log_hints: 'on'
      use_pg_rewind: true
      use_slots: true
    retry_timeout: 10
    ttl: 30
  initdb:
  initdb:
  - encoding: UTF8
  - locale: en_US.UTF-8
  - data-checksums
  post_init: /post_init.sh "zalandos"
kubernetes:
  labels:
    app: patroni
    application: patroni
    cluster: patroni
    release: patroni
  pod_ip: 10.233.87.10
  ports:
  - name: postgresql
    port: 5432
  role_label: spilo-role
  scope_label: cluster
  use_endpoints: true
postgresql:
  authentication:
    replication:
      password: '1234567890 '
      username: standby
    superuser:
      password: '1234567890 '
      username: postgres
  bin_dir: /usr/lib/postgresql/10/bin
  connect_address: 10.233.87.10:5432
  data_dir: /home/postgres/pgdata/pgroot/data
  listen: 0.0.0.0:5432
  name: patroni-1
  parameters:
    archive_command: /bin/true
    bg_mon.listen_address: 0.0.0.0
    extwlist.extensions: btree_gin,btree_gist,hstore,intarray,ltree,pgcrypto,pgq,pg_trgm,postgres_fdw,uuid-ossp,hypopg
    log_destination: csvlog
    log_directory: ../pg_log
    log_file_mode: '0644'
    log_filename: postgresql-%u.log
    log_rotation_age: 1d
    log_truncate_on_rotation: 'on'
    logging_collector: 'on'
    shared_buffers: 1995MB
    shared_preload_libraries: bg_mon,pg_stat_statements,pg_cron,set_user,pgextwlist
    ssl: 'on'
    ssl_cert_file: /home/postgres/server.crt
    ssl_key_file: /home/postgres/server.key
  pg_hba:
  - local   all             all                                   trust
  - hostssl all             +zalandos    127.0.0.1/32       pam
  - host    all             all                127.0.0.1/32       md5
  - hostssl all             +zalandos    ::1/128            pam
  - host    all             all                ::1/128            md5
  - hostssl replication     standby all                md5
  - hostnossl all           all                all                reject
  - hostssl all             +zalandos    all                pam
  - hostssl all             all                all                md5
  use_unix_socket: true
restapi:
  connect_address: 10.233.87.10:8008
  listen: 0.0.0.0:8008
scope: patroni

This happens on all pods. I’ve tried running patroni manually but it jumps straightly to failed issue and I don’t see timeout any more

root@patroni-1:/home/postgres# patroni postgres.yml 
2018-01-24 07:26:16,041 INFO: Lock owner: None; I am patroni-1
2018-01-24 07:26:16,058 INFO: failed to acquire initialize lock

Tried mimic start up, but I can’t do it on kubernetes:

root@patroni-1:/home/postgres# /launch.sh 
ERROR: Supervisord is already running

Issue is same on all pods. Also tried to dive into code but it was hard for me to follow the breadcrumbs. So what should I do now? Is there a misconfiguration?

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:18 (6 by maintainers)

github_iconTop GitHub Comments

2reactions
k1ng440commented, Jan 30, 2019

i have upgraded helm to use kubernetes DCS. its seems patronictl is not working anymore.

1reaction
soltyshcommented, Jan 26, 2018

@k1-hedayati you mind opening a PR with that role and rolebinding to the k8s template, I must admit I’ve been struggling with this problem (failed reading pods, and silent endpoint updates) myself as well.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Kubernetes agents are failing with 'SocketTimeoutException
The exception java.net.SocketTimeoutException: timeout is caused by the read (or request) timeout being exceeded during the connection between the Jenkins ...
Read more >
Kubernetes: Fixing Delayed Service Endpoint Updates
Kubernetes : Fixing Delayed Service Endpoint Updates. A few months ago I noticed weird connection timeouts when updating a Deployment within ...
Read more >
Kubectl: Kubernetes with minikube times out - Stack Overflow
I was getting below error, because minikube VM was short of memory that was allocated.Increasing RAM should solve this issue. Unable to connect ......
Read more >
Watchdog support — Patroni 2.1.5 documentation
By default accessing DCS is configured to time out after 10 seconds. This means that when DCS is unavailable, for example due to...
Read more >
Patroni & etcd in High Availability Environments - Crunchy Data
If the etcd system cannot verify writes before the heartbeats time out, or if the primary instance fails to renew its status as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found