OSD creation isn't fully idempotent
See original GitHub issueSplitting off issue #72.
I finished with this issue and testing the other #69 (took some time to prepare my bare metal cluster that I am going to install). I have some problems regarding idempodent creation of osd. The role stops if an osd is already in use.
@lae Should we check, if a defined OSD is already in use and move on with the other tasks?
Example: I had some timeout during OSD creation and it was necessary to replay the playbook. The OSD’s were created properly but the execution stops after OSD creation. Role does not create pools and storages at this stage.
_Originally posted by @mholasek in https://github.com/lae/ansible-role-proxmox/issues/72#issuecomment-539083013_
yes - I’m unfortunately not able to test idempotency of that part myself (no longer working at fireeye/don’t have access to a lot of physical hardware anymore) and CI can’t either. All tasks should be made idempotent.
I’m not sure if you can just use a
creates
argument on the OSD creation step (because iirc OSD creation picks a random number for the folder name) but maybe you could add another task before the creation step to check if there is an existing OSD associated with the selected drive, and then skip OSD creation/configuration tasks based on the result?
_Originally posted by @lae in https://github.com/lae/ansible-role-proxmox/issues/72#issuecomment-539086108_
@lae I am not sure why the current osd creation is done via “creates:” argument by checking for a /dev/sd?1. I’ll do my tests with proxmox v6.0 and nautilus and it does not create any partition. Maybe this is correct under luminous, but I have no chance to test this.
Thats why I tried another approach by checking if there is already any ceph lvm volume via
ceph-volume lvm list
command. But I am not sure if this can be used for luminous as well (but I’ll guess).What do you think, should we go for this? If OK, I’ll do some additional testing, enhance the readme within the next days and create a pull request after that. Maybe we mention, that ceph task is still beta and only tested with proxmox v6.0? You can take a look at the fork (feature branch): https://github.com/mholasek/ansible-role-proxmox/tree/feature/ceph-replication-network.
_Originally posted by @mholasek in https://github.com/lae/ansible-role-proxmox/issues/72#issuecomment-539579164_
@mholasek What’s some example JSON output from that command once you get a successful OSD provisioned with
pveceph
? (docs show that you can use--format=json
) Just to confirm, thelvm list
command doesn’t have anything to do with Linux LVM, right? (In other words, this command should be appropriate for checking provisioned OSDs in all deployment scenarios with pveceph?) If that’s the case, we could possibly create a small Ansible module for OSD creation without needing to write any parsing code.So I had forgotten earlier, but I believe
{device}1
was selected on the basis that thepveceph osd create
command creates a partition on the device, and that was picked instead of something like/var/lib/ceph/osd-5
(which is what I was referring to in my previous comment) to keep the OSD creation step idempotent. (For mutual reference, you’re referring to this line, right? So it’s not necessarily/dev/sd?1
, unless you’re finding that somewhere else?) Thepveceph
tool expects the devices passed to be one the patterns listed in this comment, at least in PVE5. Are you trying with something different? (maybe you could check if that code is any different in PVE6?)Also, I did add a note to the README in the Ceph PR I mentioned earlier stating that PVE Ceph management with this role is experimental, so I think we’re fine there.
_Originally posted by @lae in https://github.com/lae/ansible-role-proxmox/issues/72#issuecomment-539608972_
Issue Analytics
- State:
- Created 4 years ago
- Comments:11 (7 by maintainers)
Top GitHub Comments
It looks like the
ceph-ansible
repository has an existing Ansible module that we could probably leverage. https://github.com/ceph/ceph-ansible/blob/master/library/ceph_volume.pyFixed by #81