deleteOldVersions API
See original GitHub issueDescribe the problem After refreshing an index, the old version of the index remains on the storage. We should keep the old versions to support consistency & isolation of index data, but at some point in time, they’re no longer needed. So it would be good if there’s an API to clean up the old versions.
Describe your proposed solution
API design
def deleteOldVersions(indexName: String)
hs.deleteOldVersions("indexName")
But there’s no API to show the list of versions. I think it would be great to provide an API for statistics of an index so that a user can check [ size of index / existing versions / creation time / last used time(from event log)… etc]
Now hs.index("indexName")
returns “indexContentPaths” column that shows the paths referred by the latest index version.
So based on that info, we could validate the given versions
and determine which versions should we delete.
Additional context
Issue Analytics
- State:
- Created 3 years ago
- Comments:5 (3 by maintainers)
Yea
hs.index("indexName")
was added after the sentences.@sezruby Ok. I will work on this. Since I am not aware of Delta Lake time travel query, I will first do the simple implementation ### first and ask you about how time travel query works.
For naming convention,
delete index
doesn’t remove actual index files butvacuum index
does remove the files.Since the new api actually removes the index files (except the latest one), I think it is more like
vacuumOldVersions
orvacuumOld
WDYT?