question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Hot-swapping disks confuses ceph-ansible

See original GitHub issue

Hi,

We have osds’ devices specified via inventory (to be /dev/sda through /dev/sdbh), and raw journal devices similarly specified for NVME cards, and raw_multi_journal set.

This works fine, until we hot-plug a disk (i.e. to replace a failed drive) - the new drive then appears as /dev/sdbk and the failed drive doesn’t exist in /dev/ any more, so ceph-ansible then fails on that host in the ‘ceph-osd : fix partitions gpt header or labels of the osd disks’ task.

Example barf:

failed: [sto-2-2] (item=[{'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2017-07-26 11:36:12.019353', '_ansible_no_log': False, u'stdout': u'', u'cmd': u'parted --script /dev/sdah print > /dev/null 2>&1', u'rc': 1, 'item': [{'_ansible_parsed': True, 'stderr_lines': [], '_ansible_item_result': True, u'end': u'2017-07-26 11:31:47.372346', '_ansible_no_log': False, u'stdout': u'', u'cmd': u"readlink -f /dev/sdah | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", u'rc': 1, 'item': u'/dev/sdah', u'delta': u'0:00:00.004611', u'stderr': u'', u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u"readlink -f /dev/sdah | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-07-26 11:31:47.367735', 'failed': False}, u'/dev/sdah'], u'delta': u'0:00:00.004145', u'stderr': u'', u'changed': False, u'invocation': {u'module_args': {u'warn': True, u'executable': None, u'_uses_shell': True, u'_raw_params': u'parted --script /dev/sdah print > /dev/null 2>&1', u'removes': None, u'creates': None, u'chdir': None}}, 'stdout_lines': [], 'failed_when_result': False, u'start': u'2017-07-26 11:36:12.015208', 'failed': False}, u'/dev/sdah']) => {"changed": false, "cmd": "sgdisk --zap-all --clear --mbrtogpt -- /dev/sdah || sgdisk --zap-all --clear --mbrtogpt -- /dev/sdah", "delta": "0:00:00.008660", "end": "2017-07-26 13:07:28.664214", "failed": true, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "parted --script /dev/sdah print > /dev/null 2>&1", "delta": "0:00:00.004145", "end": "2017-07-26 11:36:12.019353", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "parted --script /dev/sdah print > /dev/null 2>&1", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": [{"_ansible_item_result": true, "_ansible_no_log": false, "_ansible_parsed": true, "changed": false, "cmd": "readlink -f /dev/sdah | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "delta": "0:00:00.004611", "end": "2017-07-26 11:31:47.372346", "failed": false, "failed_when_result": false, "invocation": {"module_args": {"_raw_params": "readlink -f /dev/sdah | egrep '/dev/([hsv]d[a-z]{1,2}|cciss/c[0-9]d[0-9]p|nvme[0-9]n[0-9]p)[0-9]{1,2}$'", "_uses_shell": true, "chdir": null, "creates": null, "executable": null, "removes": null, "warn": true}}, "item": "/dev/sdah", "rc": 1, "start": "2017-07-26 11:31:47.367735", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/sdah"], "rc": 1, "start": "2017-07-26 11:36:12.015208", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}, "/dev/sdah"], "rc": 4, "start": "2017-07-26 13:07:28.655554", "stderr": "Problem opening /dev/sdah for reading! Error is 2.\nThe specified file does not exist!\nProblem opening '' for writing! Program will now terminate.\nWarning! MBR not overwritten! Error is 2!\nCaution! Secondary header was placed beyond the disk's limits! Moving the\nheader, but other problems may occur!\nUnable to open device '' for writing! Errno is 2! Aborting write!\nProblem opening /dev/sdah for reading! Error is 2.\nThe specified file does not exist!\nProblem opening '' for writing! Program will now terminate.\nWarning! MBR not overwritten! Error is 2!\nCaution! Secondary header was placed beyond the disk's limits! Moving the\nheader, but other problems may occur!\nUnable to open device '' for writing! Errno is 2! Aborting write!", "stderr_lines": ["Problem opening /dev/sdah for reading! Error is 2.", "The specified file does not exist!", "Problem opening '' for writing! Program will now terminate.", "Warning! MBR not overwritten! Error is 2!", "Caution! Secondary header was placed beyond the disk's limits! Moving the", "header, but other problems may occur!", "Unable to open device '' for writing! Errno is 2! Aborting write!", "Problem opening /dev/sdah for reading! Error is 2.", "The specified file does not exist!", "Problem opening '' for writing! Program will now terminate.", "Warning! MBR not overwritten! Error is 2!", "Caution! Secondary header was placed beyond the disk's limits! Moving the", "header, but other problems may occur!", "Unable to open device '' for writing! Errno is 2! Aborting write!"], "stdout": "Information: Creating fresh partition table; will override earlier problems!\nInformation: Creating fresh partition table; will override earlier problems!", "stdout_lines": ["Information: Creating fresh partition table; will override earlier problems!", "Information: Creating fresh partition table; will override earlier problems!"]}

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:30 (14 by maintainers)

github_iconTop GitHub Comments

1reaction
BenoitKnechtcommented, Dec 16, 2019

On most distributions, udev already generates persistent disk names under /dev/disk/by-path/.

@mcv21 Wouldn’t it solve your issue if you specified your disk paths as /dev/disk/by-path/... instead of /dev/sda? If a disk fails and you replace it, udev should create a link with the exact same name since it’ll be plugged in on the same connector.

0reactions
github-actions[bot]commented, Aug 18, 2021

This issue has been automatically closed due to inactivity. Please re-open if this still requires investigation.

Read more comments on GitHub >

github_iconTop Results From Across the Web

1384846 – [ceph-ansible]: can fail with "Invalid partition data!"
I was installing on systems which had previously been running RHCS 2, so the disks were already stamped with ceph and had FSID...
Read more >
ceph to physical hard drive. How is this mapped? - Reddit
I am wondering how the drives underneath ceph maintain fault tolerance and how they handle a drive failure if multiple drives are used...
Read more >
Red Hat supplementary style guide for product documentation
A Ceph Monitor maintains the master copy of the Red Hat Ceph Storage ... Create an Ansible inventory file that is named ......
Read more >
ceph-users@ceph.io - Mailing Lists
The cluster has 42 OSD nodes and each node has 12 x 14TB disks and 2 x 3.8TB ... but it has been...
Read more >
OpenShift Container Platform 4.11 release notes
Translated objects are not stored on disk, and user data is not migrated. While storage class referencing to the in-tree storage plug-in will...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found