Tarql outputs invalid RDF with URIs containing spaces
See original GitHub issueI have encountered several cases where I build a URI using code like:
bind(uri(concat("http://example.com/id/", ?ID)) as ?uri)
Where in the sample data all the values in ID
column looked like usable identifiers. Only when the code is used to transform a complete dataset, it transpires that some of the values contain spaces.
In such cases, Tarql will happily output RDF where the spaces are not encoded in the URI, therefore invalid e.g. http://example.com/id/some text
.
The issue is only discovered when some downstream tooling throws an error.
Preferably Tarql should throw an error when serializing the RDF so problems are discovered earlier.
Workaround: Always use ENCODE_FOR_URI
function to properly encode any non-uri friendly characters.
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:5 (3 by maintainers)
Top Results From Across the Web
tarql - Bountysource
Currently, a Tarql query may contain multiple CONSTRUCT queries, to produce triples. Allowing ASK queries would make it possible to include validation rules....
Read more >Column name with spaces in tarql - csv - Stack Overflow
When you have to deal with complicated situations my suggestion is: first try with an exploratory tests; Let's see by example:.
Read more >About – Tarql – SPARQL for Tables: Turn CSV into RDF using ...
SPARQL for Tables: Turn CSV into RDF using SPARQL syntax. ... --write-base adds @base to Turtle output; Ignore cells containing only whitespace ......
Read more >R2RML: RDB to RDF Mapping Language - W3C
By default, all RDF triples are in the default graph of the output dataset. A triples map can contain graph maps that place...
Read more >How to SPARQL with tarql - Semantic Arts
Then pick one row and write down what you want the tarql output to ... in csv format typically do not contain URIs...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
That would be one reasonable behaviour. The advantage of treating it as an error would be that it would double as a way of checking whether a string is a valid IRI:
isIRI(IRI(?x))
This is a known issue with Jena. The
URI
/IRI
functions don’t do syntax checks.encode_for_URI
is indeed the best workaround currently available. SPARQL has no concept of “throwing an error” during query execution, so we can’t write a query the aborts with an error if we detect a problem.If this were fixed in Jena, the fix would probably be that
?uri
ends up being unbound. That makes sense with the way SPARQL expression evaluation works, but would arguably be even worse for Tarql because with an unbound?uri
, theCONSTRUCT
template will simply not produce any triples where?uri
is used. So, the result would be a valid RDF file, but with some triples silently missing.Maybe Tarql needs a
tarql:assert(...)
function that aborts the entire query execution if the argument is false or unbound. Then one could dotarql:assert(isIRI(?uri))
or simplytarql:assert(?uri)
to force execution to fail early.