Matching PAGE imageFilename to mets:file when imageFilename is not a URL
See original GitHub issueScenario:
-
Image files and PAGE referencing those image files by relative filepath:
<Page imageFilename="foo.tif"/>
-
Create a METS file and run
workspace add
:<mets:file GROUPID="page0001" xlink:href="file://path/to/bla/foo.tif"
Now the PAGE imageFilename
and xlink:href
of the corresponding mets:file do not match anymore.
Issue Analytics
- State:
- Created 5 years ago
- Comments:34 (19 by maintainers)
Top Results From Across the Web
Nginx Match Image Filename Based On Part of URL?
UPDATE I was able to get it to match and look up the filename only from what appears to be the right directory,...
Read more >ocrd.cli.workspace module — ocrd 2.41.0 documentation
Add a file or http(s) URL FNAME to METS in a workspace. If FNAME is not an http(s) URL and is not a...
Read more >OCR-D/Lobby - Gitter
This is the case for OCR-D-IMG images or any PAGE-XML file group, but not the ... So the derived images referenced by their...
Read more >Google spreadheets get image filename instead of URL ...
Would not it be better to have 2 columns, one with shared link and an other with the filename ? The funny part...
Read more >Greenstone tutorial exercises (2019)
If the link is to a document that is not in the collection, ... Image, extracted metadata that reflects an image's filename, which...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Revisiting this with @tboenig:
imageFilename
in PAGE must always be a relative file path relative to that PAGE file, otherwise tools like Aletheia or PAGEViewer won’t workmets:FLocat
is ideally a relative path from themets.xml
So we need logic to determine the relative path from mets.xml to image by resolving imageFilename of a PAGE against the relative path to that PAGE.
Yes, that’s crucial. If we take this seriously,
ocrd workspace add
on PAGE-XML files will either take control of that file or make a copy of it (under the “right” path).I guess we have to consider the possibility. If we solve this conceptually for
Page/@imageFilename
, it should work the same forAlternativeImage/@filename
though.IIUC you assume here that
ocrd workspace add
will be responsible for adding the image file along with the PAGE-XML file passed to it. We could have other provisions (like assuming the image file must already have been added by then), but let’s follow this logic for now:Yes, the image could be placed under a fileGrp implicitly derived from the fileGrp for the PAGE-XML, or even the same fileGrp (just with a different MIME type and not appearing in the structMap).
If we add an option, why not just the name of the image file group (or none for “ignore images”)?
Right. And let’s think about the second use-case (adding PAGE-XML after image) more thoroughly: Now
ocrd workspace add
can go looking for the (basename of the) filename in the (image) flocat URLs of the METS, and calculate the new relative path for the PAGE-XML under its destination directory. If it does not find an image with that filename, it can still go looking for an image with the same pageId. And then it can fail loudly.Personally, I think this is the more sensible interface than add-image-via-PAGE.
This got me confused: I though we are talking about adding PAGE-XML files here?