question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

OSD creation isn't fully idempotent

See original GitHub issue

Splitting off issue #72.

I finished with this issue and testing the other #69 (took some time to prepare my bare metal cluster that I am going to install). I have some problems regarding idempodent creation of osd. The role stops if an osd is already in use.

@lae Should we check, if a defined OSD is already in use and move on with the other tasks?

Example: I had some timeout during OSD creation and it was necessary to replay the playbook. The OSD’s were created properly but the execution stops after OSD creation. Role does not create pools and storages at this stage.

_Originally posted by @mholasek in https://github.com/lae/ansible-role-proxmox/issues/72#issuecomment-539083013_

yes - I’m unfortunately not able to test idempotency of that part myself (no longer working at fireeye/don’t have access to a lot of physical hardware anymore) and CI can’t either. All tasks should be made idempotent.

I’m not sure if you can just use a creates argument on the OSD creation step (because iirc OSD creation picks a random number for the folder name) but maybe you could add another task before the creation step to check if there is an existing OSD associated with the selected drive, and then skip OSD creation/configuration tasks based on the result?

_Originally posted by @lae in https://github.com/lae/ansible-role-proxmox/issues/72#issuecomment-539086108_

@lae I am not sure why the current osd creation is done via “creates:” argument by checking for a /dev/sd?1. I’ll do my tests with proxmox v6.0 and nautilus and it does not create any partition. Maybe this is correct under luminous, but I have no chance to test this.

Thats why I tried another approach by checking if there is already any ceph lvm volume via ceph-volume lvm listcommand. But I am not sure if this can be used for luminous as well (but I’ll guess).

What do you think, should we go for this? If OK, I’ll do some additional testing, enhance the readme within the next days and create a pull request after that. Maybe we mention, that ceph task is still beta and only tested with proxmox v6.0? You can take a look at the fork (feature branch): https://github.com/mholasek/ansible-role-proxmox/tree/feature/ceph-replication-network.

_Originally posted by @mholasek in https://github.com/lae/ansible-role-proxmox/issues/72#issuecomment-539579164_

@mholasek What’s some example JSON output from that command once you get a successful OSD provisioned with pveceph? (docs show that you can use --format=json) Just to confirm, the lvm list command doesn’t have anything to do with Linux LVM, right? (In other words, this command should be appropriate for checking provisioned OSDs in all deployment scenarios with pveceph?) If that’s the case, we could possibly create a small Ansible module for OSD creation without needing to write any parsing code.

So I had forgotten earlier, but I believe {device}1 was selected on the basis that the pveceph osd create command creates a partition on the device, and that was picked instead of something like /var/lib/ceph/osd-5 (which is what I was referring to in my previous comment) to keep the OSD creation step idempotent. (For mutual reference, you’re referring to this line, right? So it’s not necessarily /dev/sd?1, unless you’re finding that somewhere else?) The pveceph tool expects the devices passed to be one the patterns listed in this comment, at least in PVE5. Are you trying with something different? (maybe you could check if that code is any different in PVE6?)

Also, I did add a note to the README in the Ceph PR I mentioned earlier stating that PVE Ceph management with this role is experimental, so I think we’re fine there.

_Originally posted by @lae in https://github.com/lae/ansible-role-proxmox/issues/72#issuecomment-539608972_

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:11 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
laecommented, Oct 8, 2019

It looks like the ceph-ansible repository has an existing Ansible module that we could probably leverage. https://github.com/ceph/ceph-ansible/blob/master/library/ceph_volume.py

0reactions
laecommented, Nov 20, 2019

Fixed by #81

Read more comments on GitHub >

github_iconTop Results From Across the Web

Bug #44824: cephadm: adding osd device is not idempotent
cephadm: adding osd device is not idempotent ; Status: Resolved ; Priority: Normal ; Assignee: - ; Category: cephadm/osd ; Target version: Ceph...
Read more >
7 Best Practices for Writing Kubernetes Operators: An SRE ...
Some tips and tricks from the experts on Operators.
Read more >
Software Release v1809 - croit GmbH
We are proud to announce the release of croit v1809, featuring Ceph Mimic 13.2.2, OSD encyption, cloud backups, IPv6 and more.
Read more >
ceph-volume - Ceph OSD deployment and inspection tool
It deviates from ceph-disk by not interacting or relying on the udev ... Skip creating and enabling systemd units and starting of OSD...
Read more >
Ceph.io — 13.2.3 Mimic released
If you are already running v13.2.2, upgrading to v13.2.3 does not ... osd: add creating to pg_string_state (issue#36174, issue#36298, ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found