mets:file URL handling: keep remote links
See original GitHub issueCurrently with workspaces we can either keep images on the remote side by using http URLs in mets:file/mets:FLocat/@xlink:href
(which means they have to be downloaded again and again during processing), or get local filesystem copies with relative paths by cloning with download=True
or bagging and spilling (but then the source information will be lost forever).
When processing is finished and I want to make my workspace public, I now have to upload my shiny new results in addition to the original images – which I might not even have the rights to publish myself. It would be much better, if the original remote URLs would be used again for that – even if I used local copies in between.
METS-XML allows that: A mets:FLocat
has xs:@maxoccurs=unbounded
within mets:file
, with the following documented semantic:
The file element provides access to content files for a METS object. A file element may contain one or more FLocat elements, which provide pointers to a content file, and/or an FContent element, which wraps an encoded version of the file. Note that ALL FLocat and FContent elements underneath a single file element should identify/contain identical copies of a single file.
So why don’t we keep 2 FLocat
elements in that case, one relative path for local processing and one remote URL for provenance/bookkeeping? When making results public, the local copies could be disposed of again, e.g. when bagging with --manifestation-depth=partial
.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:6 (3 by maintainers)
Top GitHub Comments
We could also implement the local_filename stuff as additional FLocat as you propose and have a processor that strips the METS down to ZVDD requirements.
Should be revisited now that the OLA-HD client has arrived.