question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

mgr daemons created by ceph-ansible are not visible to ceph

See original GitHub issue

ansible-github.log

Bug Report

I have 3 mgr daemons running but ceph is not aware of them.

[root@bvcephtest05 ~]# ceph status
  cluster:
    id:     81ae6812-607f-4205-9484-2d063a9c4431
    health: HEALTH_WARN
            clock skew detected on mon.bvcephtest04, mon.bvcephtest05

  services:
    mon: 3 daemons, quorum bvcephtest03,bvcephtest04,bvcephtest05 (age 5m)
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:


[root@bvcephtest05 ~]# systemctl status ceph-mgr
ceph-mgr@bvcephtest05.service  ceph-mgr.target
[root@bvcephtest05 ~]# systemctl status ceph-mgr@bvcephtest05.service
● ceph-mgr@bvcephtest05.service - Ceph cluster manager daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-04-24 11:34:35 BST; 5min ago
 Main PID: 50595 (ceph-mgr)
    Tasks: 21 (limit: 23597)
   Memory: 117.7M
   CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@bvcephtest05.service
           └─50595 /usr/bin/ceph-mgr -f --cluster ceph --id bvcephtest05 --setuser ceph --setgroup ceph

Apr 24 11:34:35 bvcephtest05 systemd[1]: Started Ceph cluster manager daemon…`

What you expected to happen:

The mgr daemons to be detected 😃

How to reproduce it (minimal and precise):

This is how I have defined my inventory. My hosts are all Centos8 VMs.

cat  inventory
[mons]
bvcephtest03
bvcephtest04
bvcephtest05
[mgrs]
bvcephtest03
bvcephtest04
bvcephtest05

[osds]
bvcephtest01
bvcephtest03
bvcephtest04
bvcephtest05

[rgws]
bvcephtest01

[grafana-server]
bvcephtest01

I have tried this a number of times and get the same result. I’ve used new VMs and run the purge-cluster playbook to start the entire process from scratch.

The run fails at the same task every time TASK [ceph-mgr : wait for all mgr to be up] Share your group_vars files, inventory and full ceph-ansibe log

Environment:

  • OS (e.g. from /etc/os-release):
NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"
  • Kernel (e.g. uname -a): Linux bvcephtest05 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
  • Docker version if applicable (e.g. docker version): NA
  • Ansible version (e.g. ansible-playbook --version): ansible-playbook 2.9.6
  • ceph-ansible version (e.g. git head or tag or stable branch): stable-5.0
  • Ceph version (e.g. ceph -v): ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)

Group vars grep -vE ‘^#’ group_vars/all.yml | uniq

---
centos_package_dependencies:
  - epel-release
  - python3-libselinux
ntp_service_enabled: true
ceph_origin: repository
ceph_repository: community
ceph_stable_release: octopus
ceph_stable_redhat_distro: el8
monitor_address_block: 10.64.6.0/24
public_network: 10.64.6.0/24
cluster_network: "{{ public_network | regex_replace(' ', '') }}"
radosgw_address: 10.64.6.11
dashboard_enabled: True
dashboard_protocol: http
dashboard_port: 8443
dashboard_admin_user: admin
dashboard_admin_user_ro: false
dashboard_admin_password: p@ssw0rd
grafana_admin_user: admin
grafana_admin_password: admin

grep -vE ‘^#’ group_vars/osds.yml | uniq

---
devices:
  - /dev/sdb

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

1reaction
micmoylescommented, Apr 27, 2020

@dsavineau, thanks for replicating. Over the weekend I altered the config a few times and managed to get something working. It seems my issue was that 3 mgr daemons are somehow unable to elect active and passive between them. My hunch is that this is caused by some old configs that were left on the machine between my installations and purges (the machines were brand new VMs before this). I added a clean machine to the cluster and used it as the only mon/mgr which allowed the cluster to come up healthy.

If I get some time I’ll try re-installing a mon and mgr back onto the troublesome machines in an effort to understand what the cause of this was. I’ll close this issue for now and put it down to user error. Thanks for the assistance.

0reactions
dsavineaucommented, Apr 24, 2020

I tried to replicate your setup with the same configuration (except the network cidr) and firewalld enable but everything went well

TASK [show ceph status for cluster ceph] *****************************************************************************************************************************************************************************************************
task path: /home/ds/Workspace/mgrwait/site.yml.sample:457
Friday 24 April 2020  14:55:52 -0400 (0:00:01.946)       0:21:38.053 ********** 
ok: [bvcephtest03 -> 10.41.11.143] => 
  msg:
  - '  cluster:'
  - '    id:     2c427774-ecef-435f-bbea-48a9ef538bb4'
  - '    health: HEALTH_OK'
  - ' '
  - '  services:'
  - '    mon: 3 daemons, quorum bvcephtest05,bvcephtest03,bvcephtest04 (age 15m)'
  - '    mgr: bvcephtest05(active, starting, since 2s), standbys: bvcephtest04, bvcephtest03'
  - '    osd: 4 osds: 4 up (since 9m), 4 in (since 9m)'
  - '    rgw: 1 daemon active (bvcephtest01.rgw0)'
  - ' '
  - '  task status:'
  - ' '
  - '  data:'
  - '    pools:   5 pools, 105 pgs'
  - '    objects: 189 objects, 5.0 KiB'
  - '    usage:   4.1 GiB used, 196 GiB / 200 GiB avail'
  - '    pgs:     105 active+clean'
  - ' '
META: ran handlers
META: ran handlers

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
bvcephtest01               : ok=278  changed=60   unreachable=0    failed=0    skipped=350  rescued=0    ignored=0   
bvcephtest03               : ok=351  changed=49   unreachable=0    failed=0    skipped=446  rescued=0    ignored=0   
bvcephtest04               : ok=287  changed=43   unreachable=0    failed=0    skipped=407  rescued=0    ignored=0   
bvcephtest05               : ok=297  changed=45   unreachable=0    failed=0    skipped=404  rescued=0    ignored=0   


INSTALLER STATUS *****************************************************************************************************************************************************************************************************************************
Install Ceph Monitor           : Complete (0:01:52)
Install Ceph Manager           : Complete (0:03:18)
Install Ceph OSD               : Complete (0:02:43)
Install Ceph RGW               : Complete (0:01:17)
Install Ceph Dashboard         : Complete (0:02:11)
Install Ceph Grafana           : Complete (0:03:18)
Install Ceph Node Exporter     : Complete (0:02:00)
Read more comments on GitHub >

github_iconTop Results From Across the Web

ceph-mgr administrator's guide
These instructions describe how to set up a ceph-mgr daemon manually. First, create an authentication key for your daemon: ceph auth get-or-create mgr....
Read more >
Chapter 6. Known issues Red Hat Ceph Storage 5.0
Cephadm reports the Ceph monitors as stray daemons even though they have been removed from the storage cluster. To work around this issue,...
Read more >
Ceph.io — Introducing Cephadm
The goal of Cephadm is to provide a fully-featured, robust, ... up a minimal Ceph cluster (a single monitor and manager daemon) on...
Read more >
10 Troubleshooting the Ceph Dashboard
The command returns the URL where the Ceph Dashboard is located: (https://host:port ... To do this, first you will need the name of...
Read more >
ceph-mgr orchestrator 模块
Behind all the abstraction, the purpose of orchestrator modules is simple: enable Ceph to do things like discover available hardware, create and destroy ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found