Support suppressing empty resource expansion
See original GitHub issueVersion
4.6.0
Feature
I don’t think this is a bug per se , but (seemingly) I’ve hit a limitation on what riot
can do.
Since v1.4, PDF has supported embedding an RDF graph as metadata. This has been standardised as XMP.
I believe that, by convention, XMP uses the empty IRI to indicate that the subject of triples is the PDF file itself. The Wikipedia example suggests this; however, I haven’t verified this by checking the XMP specification.
I wrote some simple metadata in Turtle to illustrate the problem/limitation:
@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>.
<>
dc:description "An example that demonstrates a problem."@en;
dc:title "An example title"@en;
dc:creator "Jane Doe";
dc:date "2022-12-04";
dc:language "en-GB";
.
I am able to use the riot
command to convert this Turtle data into a corresponding RDF/XML file, as needed by XMP.
paul@sprocket:~/Riot problem$ riot --formatted=RDF/XML example.ttl
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="file:///home/paul/Riot%20problem/example.ttl">
<dc:language>en-GB</dc:language>
<dc:date>2022-12-04</dc:date>
<dc:creator>Jane Doe</dc:creator>
<dc:title xml:lang="en">An example title</dc:title>
<dc:description xml:lang="en">An example that demonstrates a problem.</dc:description>
</rdf:Description>
</rdf:RDF>
paul@sprocket:~/Riot problem$
The problem here is that riot
“helpfully” expands the empty IRI into a corresponding file:
IRI. Note that the rdf:Description
element contains the rdf:about
attribute with a value file:///home/paul/Riot%20problem/example.ttl
.
This is a problem because 1. the resource is the Turtle file rather than the PDF file, 2. IRIs are absolute and the PDF file may be renamed or copied onto a different system.
I was hoping for riot
to generate the following XML:
<rdf:RDF
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:dc="http://purl.org/dc/elements/1.1/">
<rdf:Description rdf:about="">
<dc:language>en-GB</dc:language>
<dc:date>2022-12-04</dc:date>
<dc:creator>Jane Doe</dc:creator>
<dc:title xml:lang="en">An example title</dc:title>
<dc:description xml:lang="en">An example that demonstrates a problem.</dc:description>
</rdf:Description>
</rdf:RDF>
As far as I’m aware, the output from riot
is correct, as the empty IRI is equivalent to the expanded resource (again, I haven’t checked this with RDF spec.). Therefore, I wouldn’t classify this as a bug.
However, the output isn’t what I need and I haven’t found an option to riot
to get the desired output; i.e., with rdf:about=""
.
A simple solution might be to add an option that suppresses riot/Jena’s ability to expand an empty IRI. A more sophisticated solution would identify IRIs that are the input file itself and replace them with the empty IRI.
Just as a side-node: embedding the above RDF/XML infoset under a <x:xmpmeta xmlns:x="adobe:ns:meta/">
element allows podofoxmp
to create a new PDF file that includes the desired RDF graph.
Are you interested in contributing a solution yourself?
None
Issue Analytics
- State:
- Created 10 months ago
- Comments:10 (3 by maintainers)
Top GitHub Comments
History:
Jena has been more lax about relative URI but it resulted in increased support costs (questions on users@). The data can’t necessary be read by other systems and care is needed to pass the base end-to-end - for example, when reloading a database.
Hi all,
Just to tidy up some loose ends, I’ve checked the XMP and PDF specifications.
XMP part 1, section 7.4 (“rdf:RDF and rdf:Description elements”) says:
XMP part 1 provides very little information about this AboutURI concept beyond identifying it as the
rdf:about
attribute of all top-levelrdf:Description
elements. Anything more is deemed out-of-scope for the XMP specification.XMP part 3 defines how XMP is embedded in various files, including PDF. For PDF, part 3 gives an overview of how this is done, but also identifies the PDF specification as the authoritative definition on how XMP is embedded. Part 3 makes no mention of AboutURI.
I checked PDF v1.6 and this shows how embedded XMP is placed within the definition of the item to which it refers. These placement rules means the target of the XMP is dictated by the XMP packet’s location within of the file. In addition, the PDF specification makes no mention of AboutURI (which, I think, makes sense).
As there is no AboutURI, the RDF/XML (for an RDF graph describing an element within a PDF file) MUST contain only top-level
rdf:Description
elements that have ardf:about
attribute with an empty value.