Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

mgr daemons created by ceph-ansible are not visible to ceph

See original GitHub issue

ansible-github.log

Bug Report

I have 3 mgr daemons running but ceph is not aware of them.

[root@bvcephtest05 ~]# ceph status
  cluster:
    id:     81ae6812-607f-4205-9484-2d063a9c4431
    health: HEALTH_WARN
            clock skew detected on mon.bvcephtest04, mon.bvcephtest05

  services:
    mon: 3 daemons, quorum bvcephtest03,bvcephtest04,bvcephtest05 (age 5m)
    mgr: no daemons active
    osd: 0 osds: 0 up, 0 in

  data:
    pools:   0 pools, 0 pgs
    objects: 0 objects, 0 B
    usage:   0 B used, 0 B / 0 B avail
    pgs:


[root@bvcephtest05 ~]# systemctl status ceph-mgr
ceph-mgr@bvcephtest05.service  ceph-mgr.target
[root@bvcephtest05 ~]# systemctl status ceph-mgr@bvcephtest05.service
● ceph-mgr@bvcephtest05.service - Ceph cluster manager daemon
   Loaded: loaded (/usr/lib/systemd/system/ceph-mgr@.service; enabled; vendor preset: disabled)
   Active: active (running) since Fri 2020-04-24 11:34:35 BST; 5min ago
 Main PID: 50595 (ceph-mgr)
    Tasks: 21 (limit: 23597)
   Memory: 117.7M
   CGroup: /system.slice/system-ceph\x2dmgr.slice/ceph-mgr@bvcephtest05.service
           └─50595 /usr/bin/ceph-mgr -f --cluster ceph --id bvcephtest05 --setuser ceph --setgroup ceph

Apr 24 11:34:35 bvcephtest05 systemd[1]: Started Ceph cluster manager daemon…`

What you expected to happen:

The mgr daemons to be detected 😃

How to reproduce it (minimal and precise):

This is how I have defined my inventory. My hosts are all Centos8 VMs.

cat  inventory
[mons]
bvcephtest03
bvcephtest04
bvcephtest05
[mgrs]
bvcephtest03
bvcephtest04
bvcephtest05

[osds]
bvcephtest01
bvcephtest03
bvcephtest04
bvcephtest05

[rgws]
bvcephtest01

[grafana-server]
bvcephtest01

I have tried this a number of times and get the same result. I’ve used new VMs and run the purge-cluster playbook to start the entire process from scratch.

The run fails at the same task every time TASK [ceph-mgr : wait for all mgr to be up] Share your group_vars files, inventory and full ceph-ansibe log

Environment:

OS (e.g. from /etc/os-release):

NAME="CentOS Linux"
VERSION="8 (Core)"
ID="centos"
ID_LIKE="rhel fedora"
VERSION_ID="8"
PLATFORM_ID="platform:el8"
PRETTY_NAME="CentOS Linux 8 (Core)"
ANSI_COLOR="0;31"
CPE_NAME="cpe:/o:centos:centos:8"
HOME_URL="https://www.centos.org/"
BUG_REPORT_URL="https://bugs.centos.org/"

CENTOS_MANTISBT_PROJECT="CentOS-8"
CENTOS_MANTISBT_PROJECT_VERSION="8"
REDHAT_SUPPORT_PRODUCT="centos"
REDHAT_SUPPORT_PRODUCT_VERSION="8"

Kernel (e.g. uname -a): Linux bvcephtest05 4.18.0-147.8.1.el8_1.x86_64 #1 SMP Thu Apr 9 13:49:54 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
Docker version if applicable (e.g. docker version): NA
Ansible version (e.g. ansible-playbook --version): ansible-playbook 2.9.6
ceph-ansible version (e.g. git head or tag or stable branch): stable-5.0
Ceph version (e.g. ceph -v): ceph version 15.2.1 (9fd2f65f91d9246fae2c841a6222d34d121680ee) octopus (stable)

Group vars grep -vE ‘^#’ group_vars/all.yml | uniq

---
centos_package_dependencies:
  - epel-release
  - python3-libselinux
ntp_service_enabled: true
ceph_origin: repository
ceph_repository: community
ceph_stable_release: octopus
ceph_stable_redhat_distro: el8
monitor_address_block: 10.64.6.0/24
public_network: 10.64.6.0/24
cluster_network: "{{ public_network | regex_replace(' ', '') }}"
radosgw_address: 10.64.6.11
dashboard_enabled: True
dashboard_protocol: http
dashboard_port: 8443
dashboard_admin_user: admin
dashboard_admin_user_ro: false
dashboard_admin_password: p@ssw0rd
grafana_admin_user: admin
grafana_admin_password: admin

grep -vE ‘^#’ group_vars/osds.yml | uniq

---
devices:
  - /dev/sdb

Issue Analytics

State:
Created 3 years ago
Comments:16 (8 by maintainers)

Top GitHub Comments

1reaction

micmoylescommented, Apr 27, 2020

@dsavineau, thanks for replicating. Over the weekend I altered the config a few times and managed to get something working. It seems my issue was that 3 mgr daemons are somehow unable to elect active and passive between them. My hunch is that this is caused by some old configs that were left on the machine between my installations and purges (the machines were brand new VMs before this). I added a clean machine to the cluster and used it as the only mon/mgr which allowed the cluster to come up healthy.

If I get some time I’ll try re-installing a mon and mgr back onto the troublesome machines in an effort to understand what the cause of this was. I’ll close this issue for now and put it down to user error. Thanks for the assistance.

0reactions

dsavineaucommented, Apr 24, 2020

I tried to replicate your setup with the same configuration (except the network cidr) and firewalld enable but everything went well

TASK [show ceph status for cluster ceph] *****************************************************************************************************************************************************************************************************
task path: /home/ds/Workspace/mgrwait/site.yml.sample:457
Friday 24 April 2020  14:55:52 -0400 (0:00:01.946)       0:21:38.053 ********** 
ok: [bvcephtest03 -> 10.41.11.143] => 
  msg:
  - '  cluster:'
  - '    id:     2c427774-ecef-435f-bbea-48a9ef538bb4'
  - '    health: HEALTH_OK'
  - ' '
  - '  services:'
  - '    mon: 3 daemons, quorum bvcephtest05,bvcephtest03,bvcephtest04 (age 15m)'
  - '    mgr: bvcephtest05(active, starting, since 2s), standbys: bvcephtest04, bvcephtest03'
  - '    osd: 4 osds: 4 up (since 9m), 4 in (since 9m)'
  - '    rgw: 1 daemon active (bvcephtest01.rgw0)'
  - ' '
  - '  task status:'
  - ' '
  - '  data:'
  - '    pools:   5 pools, 105 pgs'
  - '    objects: 189 objects, 5.0 KiB'
  - '    usage:   4.1 GiB used, 196 GiB / 200 GiB avail'
  - '    pgs:     105 active+clean'
  - ' '
META: ran handlers
META: ran handlers

PLAY RECAP ***********************************************************************************************************************************************************************************************************************************
bvcephtest01               : ok=278  changed=60   unreachable=0    failed=0    skipped=350  rescued=0    ignored=0   
bvcephtest03               : ok=351  changed=49   unreachable=0    failed=0    skipped=446  rescued=0    ignored=0   
bvcephtest04               : ok=287  changed=43   unreachable=0    failed=0    skipped=407  rescued=0    ignored=0   
bvcephtest05               : ok=297  changed=45   unreachable=0    failed=0    skipped=404  rescued=0    ignored=0   


INSTALLER STATUS *****************************************************************************************************************************************************************************************************************************
Install Ceph Monitor           : Complete (0:01:52)
Install Ceph Manager           : Complete (0:03:18)
Install Ceph OSD               : Complete (0:02:43)
Install Ceph RGW               : Complete (0:01:17)
Install Ceph Dashboard         : Complete (0:02:11)
Install Ceph Grafana           : Complete (0:03:18)
Install Ceph Node Exporter     : Complete (0:02:00)

Top Results From Across the Web

ceph-mgr administrator's guide

These instructions describe how to set up a ceph-mgr daemon manually. First, create an authentication key for your daemon: ceph auth get-or-create mgr....

Chapter 6. Known issues Red Hat Ceph Storage 5.0

Cephadm reports the Ceph monitors as stray daemons even though they have been removed from the storage cluster. To work around this issue,...

Ceph.io — Introducing Cephadm

The goal of Cephadm is to provide a fully-featured, robust, ... up a minimal Ceph cluster (a single monitor and manager daemon) on...

10 Troubleshooting the Ceph Dashboard

The command returns the URL where the Ceph Dashboard is located: (https://host:port ... To do this, first you will need the name of...

ceph-mgr orchestrator 模块

Behind all the abstraction, the purpose of orchestrator modules is simple: enable Ceph to do things like discover available hardware, create and destroy ......