Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Mgr Timeout while deploying with ansible-playbook

See original GitHub issue

Bug Report What happened:

I am trying this for the first time. So I think there maybe some configuration error from my part. But currently I cant find any.

TASK [ceph-mgr : wait for all mgr to be up] **********************************************************************************************************************************************************
Friday 24 April 2020  17:41:38 +0530 (0:00:00.037)       0:06:02.798 ********** 
FAILED - RETRYING: wait for all mgr to be up (30 retries left).
FAILED - RETRYING: wait for all mgr to be up (29 retries left).
FAILED - RETRYING: wait for all mgr to be up (28 retries left).
FAILED - RETRYING: wait for all mgr to be up (27 retries left).
FAILED - RETRYING: wait for all mgr to be up (26 retries left).
FAILED - RETRYING: wait for all mgr to be up (25 retries left).
FAILED - RETRYING: wait for all mgr to be up (24 retries left).
FAILED - RETRYING: wait for all mgr to be up (23 retries left).
FAILED - RETRYING: wait for all mgr to be up (22 retries left).
FAILED - RETRYING: wait for all mgr to be up (21 retries left).
FAILED - RETRYING: wait for all mgr to be up (20 retries left).
FAILED - RETRYING: wait for all mgr to be up (19 retries left).
FAILED - RETRYING: wait for all mgr to be up (18 retries left).
FAILED - RETRYING: wait for all mgr to be up (17 retries left).
FAILED - RETRYING: wait for all mgr to be up (16 retries left).
FAILED - RETRYING: wait for all mgr to be up (15 retries left).
FAILED - RETRYING: wait for all mgr to be up (14 retries left).
FAILED - RETRYING: wait for all mgr to be up (13 retries left).
FAILED - RETRYING: wait for all mgr to be up (12 retries left).
FAILED - RETRYING: wait for all mgr to be up (11 retries left).
FAILED - RETRYING: wait for all mgr to be up (10 retries left).
FAILED - RETRYING: wait for all mgr to be up (9 retries left).
FAILED - RETRYING: wait for all mgr to be up (8 retries left).
FAILED - RETRYING: wait for all mgr to be up (7 retries left).
FAILED - RETRYING: wait for all mgr to be up (6 retries left).
FAILED - RETRYING: wait for all mgr to be up (5 retries left).
FAILED - RETRYING: wait for all mgr to be up (4 retries left).
FAILED - RETRYING: wait for all mgr to be up (3 retries left).
FAILED - RETRYING: wait for all mgr to be up (2 retries left).
FAILED - RETRYING: wait for all mgr to be up (1 retries left).
fatal: [root@10.70.59.138 -> root@10.70.59.138]: FAILED! => changed=false 
  attempts: 30
  cmd:
  - ceph
  - --cluster
  - ceph
  - mgr
  - dump
  - -f
  - json
  delta: '0:00:00.229637'
  end: '2020-04-24 17:45:18.220025'
  rc: 0
  start: '2020-04-24 17:45:17.990388'
  stderr: ''
  stderr_lines: <omitted>
  stdout: |2-
  
    {"epoch":1,"active_gid":0,"active_name":"","active_addrs":{"addrvec":[]},"active_addr":":/0","active_change":"0.000000","available":false,"standbys":[],"modules":["iostat","restful"],"available_modules":[],"services":{},"always_on_modules":{"nautilus":["balancer","crash","devicehealth","orchestrator_cli","progress","rbd_support","status","volumes"]}}
  stdout_lines: <omitted>

NO MORE HOSTS LEFT ***********************************************************************************************************************************************************************************

PLAY RECAP *******************************************************************************************************************************************************************************************
root@10.70.59.138          : ok=178  changed=14   unreachable=0    failed=1    skipped=289  rescued=0    ignored=0   
root@10.70.59.139          : ok=96   changed=7    unreachable=0    failed=0    skipped=212  rescued=0    ignored=0   
root@10.70.59.140          : ok=96   changed=7    unreachable=0    failed=0    skipped=212  rescued=0    ignored=0   


INSTALLER STATUS *************************************************************************************************************************************************************************************
Install Ceph Monitor           : Complete (0:02:25)
Install Ceph Manager           : In Progress (0:04:40)
	This phase can be restarted by running: roles/ceph-mgr/tasks/main.yml

What you expected to happen: This process to go smoothly

How to reproduce it (minimal and precise):

Share your group_vars files, inventory and full ceph-ansibe log

Environment:

OS (e.g. from /etc/os-release):
Kernel (e.g. uname -a):
Docker version if applicable (e.g. docker version):
Ansible version (e.g. ansible-playbook --version):
ceph-ansible version (e.g. git head or tag or stable branch):
Ceph version (e.g. ceph -v):

Issue Analytics

State:
Created 3 years ago
Comments:13 (4 by maintainers)

Top GitHub Comments

1reaction

nizamial09-zzcommented, May 23, 2020

Not an issue now. It had something to do with the nodes I was using. Thanks @dsavineau

1reaction

dsavineaucommented, Apr 24, 2020

@nizamial09 the default ceph-ansible ansible.cfg set the ansible log file to $HOME/ansible/ansible.log [1] But if the directory doesn’t exist then you won’t find any log file.

[1] https://github.com/ceph/ceph-ansible/blob/master/ansible.cfg#L12

Top Results From Across the Web

1528960 – Add ability to change maximum timeout for Ansible ...

The default timeout for Ansible process executed from engine has been enlarged to 30 minutes, because especially upgrading hosts can take significant amount ......

SSLError: ('The read operation timed out',) when trying to ...

So , I "solved" the problem. In fact , Ansbible has a timeout of 10 seconds for all ssh related command/read/write task.

Deployment of IPI on BM using the Ansible Playbook

When deploying in an environment where subscription manager is not being used and a local repository is being setup on the provision host...

Running Ansible Playbooks using EC2 Systems Manager Run ...

It also sets a timeout of 600 seconds. Conclusion. In this post, we've showed you how to use State Manager and Run Command...

Working with network connection options

Setting timeout options . When communicating with a remote device, you have control over how long Ansible maintains the connection to that...