question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Tarql outputs invalid RDF with URIs containing spaces

See original GitHub issue

I have encountered several cases where I build a URI using code like:

bind(uri(concat("http://example.com/id/", ?ID)) as ?uri)

Where in the sample data all the values in ID column looked like usable identifiers. Only when the code is used to transform a complete dataset, it transpires that some of the values contain spaces.

In such cases, Tarql will happily output RDF where the spaces are not encoded in the URI, therefore invalid e.g. http://example.com/id/some text.

The issue is only discovered when some downstream tooling throws an error.

Preferably Tarql should throw an error when serializing the RDF so problems are discovered earlier.

Workaround: Always use ENCODE_FOR_URI function to properly encode any non-uri friendly characters.

Issue Analytics

  • State:open
  • Created 4 years ago
  • Reactions:1
  • Comments:5 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
cygricommented, Apr 7, 2019

That would be one reasonable behaviour. The advantage of treating it as an error would be that it would double as a way of checking whether a string is a valid IRI: isIRI(IRI(?x))

1reaction
cygricommented, Apr 7, 2019

This is a known issue with Jena. The URI/IRI functions don’t do syntax checks. encode_for_URI is indeed the best workaround currently available. SPARQL has no concept of “throwing an error” during query execution, so we can’t write a query the aborts with an error if we detect a problem.

If this were fixed in Jena, the fix would probably be that ?uri ends up being unbound. That makes sense with the way SPARQL expression evaluation works, but would arguably be even worse for Tarql because with an unbound ?uri, the CONSTRUCT template will simply not produce any triples where ?uri is used. So, the result would be a valid RDF file, but with some triples silently missing.

Maybe Tarql needs a tarql:assert(...) function that aborts the entire query execution if the argument is false or unbound. Then one could do tarql:assert(isIRI(?uri)) or simply tarql:assert(?uri) to force execution to fail early.

Read more comments on GitHub >

github_iconTop Results From Across the Web

tarql - Bountysource
Currently, a Tarql query may contain multiple CONSTRUCT queries, to produce triples. Allowing ASK queries would make it possible to include validation rules....
Read more >
Column name with spaces in tarql - csv - Stack Overflow
When you have to deal with complicated situations my suggestion is: first try with an exploratory tests; Let's see by example:.
Read more >
About – Tarql – SPARQL for Tables: Turn CSV into RDF using ...
SPARQL for Tables: Turn CSV into RDF using SPARQL syntax. ... --write-base adds @base to Turtle output; Ignore cells containing only whitespace ......
Read more >
R2RML: RDB to RDF Mapping Language - W3C
By default, all RDF triples are in the default graph of the output dataset. A triples map can contain graph maps that place...
Read more >
How to SPARQL with tarql - Semantic Arts
Then pick one row and write down what you want the tarql output to ... in csv format typically do not contain URIs...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found