question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Sorting fails with custom datatype literal

See original GitHub issue

Version

4.7.0-SNAPSHOT

What happened?

Dataset: some DBpedia stuff

Query (“give me longest river”):

SELECT DISTINCT ?uri WHERE {
  ?uri a <http://dbpedia.org/ontology/River> 
  {
    ?uri <http://dbpedia.org/ontology/length> ?l 
  } UNION {
    ?uri <http://dbpedia.org/property/length> ?l 
  } 
}
ORDER BY DESC(?l) OFFSET 0 LIMIT 1

it fails with 500 and “Comparison method violates its general contract!”.

DBpedia data is annoying anyways, there are plenty of length literals with different datatype. For debugging I reduced the literal mix to two :xsd:double, <http://dbpedia.org/datatype/kilometre>

PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT DISTINCT ?uri WHERE {
  ?uri a <http://dbpedia.org/ontology/River> 
  {
    ?uri <http://dbpedia.org/ontology/length> ?l 
  } UNION {
    ?uri <http://dbpedia.org/property/length> ?l 
  } 
  filter(datatype(?l) in (xsd:double,  	<http://dbpedia.org/datatype/kilometre>))
}
ORDER BY DESC(?l) OFFSET 0 LIMIT 1

which still fails.

It does not fail if we project/select the length literal ?l though:

SELECT DISTINCT ?uri ?l WHERE {
  ?uri a <http://dbpedia.org/ontology/River> 
  {
    ?uri <http://dbpedia.org/ontology/length> ?l 
  } UNION {
    ?uri <http://dbpedia.org/property/length> ?l 
  } 
  filter(datatype(?l) in (xsd:double,  	<http://dbpedia.org/datatype/kilometre>))
}
ORDER BY DESC(?l) OFFSET 0 LIMIT 1

Looking for the Java exception I found source like here: https://bugs.openjdk.org/browse/JDK-8234482 which also does contain a hint that timsort needs sufficient amount of data to fail:

TimSort doesn’t throw this exception in all cases, though, only if there are a sufficiently large number of elements to be merged (generally > 32 elements, often hundreds are required), and if the sorting algorithm happens to detect an contradiction in the comparison method.

The JDK version is Java 11 by the way if that matters.

Relevant output and stacktrace

Query = PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>  SELECT DISTINCT ?uri WHERE {   ?uri a <http://dbpedia.org/ontology/River>    {     ?uri <http://dbpedia.org/ontology/length> ?l    } UNION {     ?uri <http://dbpedia.org/property/length> ?l    }    filter(datatype(?l) in (xsd:double,  	<http://dbpedia.org/datatype/kilometre>)) } ORDER BY DESC(?l) OFFSET 0 LIMIT 1
10:12:42 WARN  Fuseki          :: [32] RC = 500 : Comparison method violates its general contract!
java.lang.IllegalArgumentException: Comparison method violates its general contract!
	at java.util.TimSort.mergeHi(TimSort.java:903) ~[?:?]
	at java.util.TimSort.mergeAt(TimSort.java:520) ~[?:?]
	at java.util.TimSort.mergeCollapse(TimSort.java:448) ~[?:?]
	at java.util.TimSort.sort(TimSort.java:245) ~[?:?]
	at java.util.Arrays.sort(Arrays.java:1441) ~[?:?]
	at org.apache.jena.atlas.data.AbortableComparator.abortableSort(AbortableComparator.java:57) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.atlas.data.SortedDataBag.iterator(SortedDataBag.java:205) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.atlas.data.SortedDataBag.iterator(SortedDataBag.java:192) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIterSort$SortedBindingIterator.initializeIterator(QueryIterSort.java:88) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.init(IteratorDelayedInitialization.java:38) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.hasNext(IteratorDelayedInitialization.java:48) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:59) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIterDistinct.getInputNextUnseen(QueryIterDistinct.java:113) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIterDistinct.hasNextBinding(QueryIterDistinct.java:72) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIterSlice.hasNextBinding(QueryIterSlice.java:76) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.exec.RowSetStream.hasNext(RowSetStream.java:47) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:81) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeQuery(SPARQLQueryProcessor.java:380) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:279) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeWithParameter(SPARQLQueryProcessor.java:224) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:209) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:58) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execPost(SPARQLQueryProcessor.java:84) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.ActionProcessor.process(ActionProcessor.java:34) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.ActionBase.process(ActionBase.java:54) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.ActionExecLib.execActionSub(ActionExecLib.java:124) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.servlets.ActionExecLib.execAction(ActionExecLib.java:98) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.server.Dispatcher.dispatchAction(Dispatcher.java:164) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.server.Dispatcher.process(Dispatcher.java:156) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
	at org.apache.jena.fuseki.server.Dispatcher.dispatch(Dispatcher.java:83) ~[fuseki-server.jar:4.7.0-SNAPSHOT]

Steps to reproduce

  • unzip attachment
sparql --data issue_1634.ttl --query issue_1634.rq

issue_1634.zip

Are you interested in making a pull request?

No response

Issue Analytics

  • State:open
  • Created 10 months ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

3reactions
afscommented, Nov 24, 2022

This case fails:

# Compare (n1, n2) (n2, n3) (n1, n3)
> > <=:  1  1 -1 ::  206000.0e0   98800.0e0   "49.2"^^<http://dbpedia.org/datatype/kilometre>

Fortunately it turns up quickly by doing a N^3 expansion and testing 3 elements (there are over a billion cases to try - it is taking a while.)

There are lots of failures. A rate of 3%-4% so far.

1reaction
rvessecommented, Nov 24, 2022

By the way, what’s the difference in sorting with and without projecting the sorted variable? With selection of ?l it doesn’t fail

So the SPARQL specification section 18.2.5 says that ORDER BY is applied prior to DISTINCT

BUT ARQ includes an optimiser that swaps the ordering i.e. applies DISTINCT first in cases where all the variables in the ORDER BY are also projected. This is semantically equivalent and is shown to substantially increase performance because you throw out all the non-distinct solutions prior to ordering. So in that case every row is guaranteed to be unique and you can’t ever get any unstable sorting results as every row is different (at least for your dataset) i.e. every row will compare differently to every other row even if individual ordering expressions might produce equal values.

If you want to confirm that this is indeed the case you can disable that optimisation via the --set arq:optOrderByDistinctApplication=false option

I suspect what happens in the failing case is that you have two solutions with the custom datatype literals present and we aren’t guaranteeing to provide a stable sort (see PR #1406 that Andy referenced) over those.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Sorting Issue with User defined datatypes (Custom Properties ...
Instance property 'OrderCustomerInfo.LastName' is not defined for type 'Order' Description: An unhandled exception occurred during the execution ...
Read more >
Patching endyear in custom datamodel for correct sorting - TeX
As the endyear will be empty and then it will default to the 9999 in the literal part of the sorting. Then the...
Read more >
Custom Sort - Google Cloud Community
Custom selecting / sorting the data records in Global Status column (Similar to custom sort in excel) or; Sort by the numercial order...
Read more >
How to Use sorted() and sort() in Python - Real Python
In this guide, you'll learn how to sort various types of data in different data structures, customize the order, and work with two...
Read more >
Sort Transformation - SQL Server Integration Services (SSIS)
The Sort transformation sorts input data in ascending or descending order and copies the sorted data to the transformation output.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found