Hot-swapping disks confuses ceph-ansible
See original GitHub issueHi,
We have osds’ devices specified via inventory (to be /dev/sda
through /dev/sdbh
), and raw journal devices similarly specified for NVME cards, and raw_multi_journal
set.
This works fine, until we hot-plug a disk (i.e. to replace a failed drive) - the new drive then appears as /dev/sdbk
and the failed drive doesn’t exist in /dev/
any more, so ceph-ansible then fails on that host
in the ‘ceph-osd : fix partitions gpt header or labels of the osd disks’ task.
Example barf:
failed: [sto-2-2] (item=[{'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2017-07-26 11:36:12.019353', '_ansible_no_log': False, u'stdout': u'', u'cmd': u'parted --script /dev/sdah print > /dev/null 2>&1', u'rc': 1, 'item': [{'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2017-07-26 11:31:47.372346', '_ansible_no_log': False, u'stdout': u'', u'cmd': u"readlink -f /dev/sdah | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", u'rc': 1, 'item': u'/dev/sdah', u'delta': u'0:00:00.004611', u'stderr': u'', u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"readlink -f /dev/sdah | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-07-26 11:31:47.367735', 'failed': False}, u'/dev/sdah'], u'delta': u'0:00:00.004145', u'stderr': u'', u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u'parted --script /dev/sdah print > /dev/null 2>&1', u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-07-26 11:36:12.015208', 'failed': False}, u'/dev/sdah']) => {"changed": false, "cmd": "sgdisk --zap-all --clear --mbrtogpt -- /dev/sdah || sgdisk --zap-all --clear --mbrtogpt -- /dev/sdah", "delta": "0:00:00.008660", "end": "2017-07-26 13:07:28.664214", "failed": true, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/sdah print > /dev/null 2>&1", "delta": "0:00:00.004145", "end": "2017-07-26 11:36:12.019353", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/sdah print > /dev/null 2>&1", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "readlink -f /dev/sdah | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "delta": "0:00:00.004611", "end": "2017-07-26 11:31:47.372346", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "readlink -f /dev/sdah | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": "/dev/sdah", "rc": 1, "start": "2017-07-26 11:31:47.367735", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/sdah"], "rc": 1, "start": "2017-07-26 11:36:12.015208", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/sdah"], "rc": 4, "start": "2017-07-26 13:07:28.655554", "stderr": "Problem opening /dev/sdah for reading! Error is 2.\nThe specified file does not exist!\nProblem opening '' for writing! Program will now terminate.\nWarning! MBR not overwritten! Error is 2!\nCaution! Secondary header was placed beyond the disk's limits! Moving the\nheader, but other problems may occur!\nUnable to open device '' for writing! Errno is 2! Aborting write!\nProblem opening /dev/sdah for reading! Error is 2.\nThe specified file does not exist!\nProblem opening '' for writing! Program will now terminate.\nWarning! MBR not overwritten! Error is 2!\nCaution! Secondary header was placed beyond the disk's limits! Moving the\nheader, but other problems may occur!\nUnable to open device '' for writing! Errno is 2! Aborting write!", "stderr_lines": ["Problem opening /dev/sdah for reading! Error is 2.", "The specified file does not exist!", "Problem opening '' for writing! Program will now terminate.", "Warning! MBR not overwritten! Error is 2!", "Caution! Secondary header was placed beyond the disk's limits! Moving the", "header, but other problems may occur!", "Unable to open device '' for writing! Errno is 2! Aborting write!", "Problem opening /dev/sdah for reading! Error is 2.", "The specified file does not exist!", "Problem opening '' for writing! Program will now terminate.", "Warning! MBR not overwritten! Error is 2!", "Caution! Secondary header was placed beyond the disk's limits! Moving the", "header, but other problems may occur!", "Unable to open device '' for writing! Errno is 2! Aborting write!"], "stdout": "Information: Creating fresh partition table; will override earlier problems!\nInformation: Creating fresh partition table; will override earlier problems!", "stdout_lines": ["Information: Creating fresh partition table; will override earlier problems!", "Information: Creating fresh partition table; will override earlier problems!"]}
Issue Analytics
- State:
- Created 6 years ago
- Comments:30 (14 by maintainers)
Top Results From Across the Web
1384846 – [ceph-ansible]: can fail with "Invalid partition data!"
I was installing on systems which had previously been running RHCS 2, so the disks were already stamped with ceph and had FSID...
Read more >ceph to physical hard drive. How is this mapped? - Reddit
I am wondering how the drives underneath ceph maintain fault tolerance and how they handle a drive failure if multiple drives are used...
Read more >Red Hat supplementary style guide for product documentation
A Ceph Monitor maintains the master copy of the Red Hat Ceph Storage ... Create an Ansible inventory file that is named ......
Read more >ceph-users@ceph.io - Mailing Lists
The cluster has 42 OSD nodes and each node has 12 x 14TB disks and 2 x 3.8TB ... but it has been...
Read more >OpenShift Container Platform 4.11 release notes
Translated objects are not stored on disk, and user data is not migrated. While storage class referencing to the in-tree storage plug-in will...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
On most distributions, udev already generates persistent disk names under
/dev/disk/by-path/
.@mcv21 Wouldn’t it solve your issue if you specified your disk paths as
/dev/disk/by-path/...
instead of/dev/sda
? If a disk fails and you replace it, udev should create a link with the exact same name since it’ll be plugged in on the same connector.This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.