CLI option to merge multiple RDBStorages
See original GitHub issueMotivation
I have multiple studies running on different computers that are not in the same network, and I don’t or can’t setup a central DB server.
Description
A cli option like optuna merge target.db postgresql://...
would copy all studies and trials from source.db into target.db. This can be used both to migrate a storage as well as to merge data from different storages after the fact.
The main thing architecturally that currently makes this somewhat hard (I think) is that studies and trials don’t have globally unique identifiers.
Here’s how I think this would go:
- Verify storage versions are the current for source and target
- Load all studies and trials from the
source
storage. - For every study and trial in source storage without system attribute
guid
generate a random ID using e.g.uuid.uuid4()
- Same for target storage
- For every study and trial in source storage, check if same GUID already exists in target storage. If not, copy it over.
Studies and trials would get different ids and trial numbers than before, but I don’t think it would be an issue?
Ideally, the GUIDs would be generated on study / trial creation, but I’m not sure if you want that. This way, it is ensured that trials and studies are not duplicated.
Study names could also kinda be used as a “GUID” but that’s somewhat dangerous.
I’ll probably need this feature soon anyways so I could probably make a PR. Let me know if you think it would be merged.
Alternatives (optional)
Set up a central postgresql server beforehand, make sure you always use that. This only covers part of the reason to have this feature and is less flexible.
Issue Analytics
- State:
- Created 3 years ago
- Comments:9 (2 by maintainers)
This issue was closed automatically because it had not seen any recent activity. If you want to discuss it, you can reopen it freely.
This would really be a great feature to have. I am in a similar situation. I am running process over different computers that cannot talk to each other. It would be great to be able to later merge all the save files and then have all independent process start with the merged data.