shrink-osd fails due to missing FSIDs
See original GitHub issueBug Report
What happened:
I attempted to shrink a number of OSDs by using the shink-osd.yml
playbook. However, the set_fact osd_hosts task fails as it attempts to extract a value for the non-existent key osd_fsid
from the output of ceph osd find
. Although my cluster was previously created with the lvm strategy, and my OSDs have FSIDs as confirmed by running ceph-volume lvm list
on the host, the output of ceph osd find
does not include a FSID. As a result, the playbook fails with a templating error;
TASK [set_fact osd_hosts] *******************************************************************************************************
Friday 22 February 2019 23:36:40 +0000 (0:00:00.074) 0:00:06.379 *******
ok: [localhost] => (item={'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2019-02-22 23:36:40.459355', '_ansible_no_log': False, '_ansible_delegated_vars': {'ansible_delegated_host': u'allmight.fc.kj', 'ansible_host': u'allmight.fc.kj'}, u'cmd': [u'ceph', u'--cluster', u'ceph', u'osd', u'find', u'0'], u'rc': 0, u'stdout': u'{\n "osd": 0,\n "ip": "10.1.15.21:6803/1951",\n "crush_location": {\n "datacenter": "fc",\n "host": "allmight",\n "room": "office",\n "root": "default"\n }\n}', 'item': u'0', u'delta': u'0:00:00.314916', '_ansible_item_label': u'0', u'stderr': u'', u'changed': True, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': False, u'_raw_params': u' ceph --cluster ceph osd find 0', u'removes': None, u'argv': None, u'creates': None, u'chdir': None, u'stdin': None}}, 'stdout_lines': [u'{', u' "osd": 0,', u' "ip": "10.1.15.21:6803/1951",', u' "crush_location": {', u' "datacenter": "fc",', u' "host": "allmight",', u' "room": "office",', u' "root": "default"', u' }', u'}'], u'start': u'2019-02-22 23:36:40.144439', '_ansible_ignore_errors': None, 'failed': False})
fatal: [localhost]: FAILED! => {"msg": "Unexpected templating type error occurred on ({{ osd_hosts | default([]) + [ (item.stdout | from_json).crush_location.host, (item.stdout | from_json).osd_fsid ] }}): coercing to Unicode: need string or buffer, list found"}
Output of ceph osd find
when run on the host - osd_fsid key is missing;
ceph --cluster ceph osd find 0
{
"osd": 0,
"ip": "10.1.15.21:6803/1951",
"crush_location": {
"datacenter": "fc",
"host": "allmight",
"room": "office",
"root": "default"
}
}
What you expected to happen: The playbook to execute and remove the OSDs specified.
How to reproduce it (minimal and precise):
- Check out the v3.2.7 tag.
- Copy the
shrink-osd.yml
playbook to the main directory. - Execute the playbook against a Ceph Mimic cluster. Observe the playbook failing as the
osd_fsid
key is not present in the output ofceph osd find
run by a previous task.
This issue looks to have been introduced by the backport of #3515 to v3.2 via #3530. Using the the shink-osd.yml
playbook in the v3.2.5 tag appears to work without issue - the playbook executes without any errors and lsblk
lists no LVM partitions on the disks previously used for the Ceph OSDs.
Environment:
- OS (e.g. from /etc/os-release): Ubuntu 18.04.2 LTS
- Kernel (e.g.
uname -a
): 4.15.0-45-generic #48-Ubuntu SMP Tue Jan 29 16:28:13 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux - Docker version if applicable (e.g.
docker version
): N/A - Ansible version (e.g.
ansible-playbook --version
): 2.6.11 - ceph-ansible version (e.g.
git head or tag or stable branch
): tag 3.2.7 - Ceph version (e.g.
ceph -v
): ceph version 13.2.4 (b10be4d44915a4d78a8e06aa31919e74927b142e) mimic (stable)
Issue Analytics
- State:
- Created 5 years ago
- Comments:8 (6 by maintainers)
Top GitHub Comments
I’m seeing that the releases are being cut now, so they should be arriving very soon
@dsavineau @leseb the change on the ceph side was merged into mimic and luminous a bit after the
13.2.4
release was cut. I’ll find out when a point release is going to be cut which should solve this issue after a monitor upgrade. Otherwise, maybe we can handle this out of band from the official playbook?