Sorting fails with custom datatype literal
See original GitHub issueVersion
4.7.0-SNAPSHOT
What happened?
Dataset: some DBpedia stuff
Query (“give me longest river”):
SELECT DISTINCT ?uri WHERE {
?uri a <http://dbpedia.org/ontology/River>
{
?uri <http://dbpedia.org/ontology/length> ?l
} UNION {
?uri <http://dbpedia.org/property/length> ?l
}
}
ORDER BY DESC(?l) OFFSET 0 LIMIT 1
it fails with 500 and “Comparison method violates its general contract!”.
DBpedia data is annoying anyways, there are plenty of length literals with different datatype. For debugging I reduced the literal mix to two :xsd:double, <http://dbpedia.org/datatype/kilometre>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>
SELECT DISTINCT ?uri WHERE {
?uri a <http://dbpedia.org/ontology/River>
{
?uri <http://dbpedia.org/ontology/length> ?l
} UNION {
?uri <http://dbpedia.org/property/length> ?l
}
filter(datatype(?l) in (xsd:double, <http://dbpedia.org/datatype/kilometre>))
}
ORDER BY DESC(?l) OFFSET 0 LIMIT 1
which still fails.
It does not fail if we project/select the length literal ?l
though:
SELECT DISTINCT ?uri ?l WHERE {
?uri a <http://dbpedia.org/ontology/River>
{
?uri <http://dbpedia.org/ontology/length> ?l
} UNION {
?uri <http://dbpedia.org/property/length> ?l
}
filter(datatype(?l) in (xsd:double, <http://dbpedia.org/datatype/kilometre>))
}
ORDER BY DESC(?l) OFFSET 0 LIMIT 1
Looking for the Java exception I found source like here: https://bugs.openjdk.org/browse/JDK-8234482 which also does contain a hint that timsort needs sufficient amount of data to fail:
TimSort doesn’t throw this exception in all cases, though, only if there are a sufficiently large number of elements to be merged (generally > 32 elements, often hundreds are required), and if the sorting algorithm happens to detect an contradiction in the comparison method.
The JDK version is Java 11 by the way if that matters.
Relevant output and stacktrace
Query = PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> SELECT DISTINCT ?uri WHERE { ?uri a <http://dbpedia.org/ontology/River> { ?uri <http://dbpedia.org/ontology/length> ?l } UNION { ?uri <http://dbpedia.org/property/length> ?l } filter(datatype(?l) in (xsd:double, <http://dbpedia.org/datatype/kilometre>)) } ORDER BY DESC(?l) OFFSET 0 LIMIT 1
10:12:42 WARN Fuseki :: [32] RC = 500 : Comparison method violates its general contract!
java.lang.IllegalArgumentException: Comparison method violates its general contract!
at java.util.TimSort.mergeHi(TimSort.java:903) ~[?:?]
at java.util.TimSort.mergeAt(TimSort.java:520) ~[?:?]
at java.util.TimSort.mergeCollapse(TimSort.java:448) ~[?:?]
at java.util.TimSort.sort(TimSort.java:245) ~[?:?]
at java.util.Arrays.sort(Arrays.java:1441) ~[?:?]
at org.apache.jena.atlas.data.AbortableComparator.abortableSort(AbortableComparator.java:57) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.atlas.data.SortedDataBag.iterator(SortedDataBag.java:205) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.atlas.data.SortedDataBag.iterator(SortedDataBag.java:192) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIterSort$SortedBindingIterator.initializeIterator(QueryIterSort.java:88) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.init(IteratorDelayedInitialization.java:38) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.atlas.iterator.IteratorDelayedInitialization.hasNext(IteratorDelayedInitialization.java:48) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIterPlainWrapper.hasNextBinding(QueryIterPlainWrapper.java:59) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIterConvert.hasNextBinding(QueryIterConvert.java:58) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIterDistinct.getInputNextUnseen(QueryIterDistinct.java:113) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIterDistinct.hasNextBinding(QueryIterDistinct.java:72) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIterSlice.hasNextBinding(QueryIterSlice.java:76) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIteratorWrapper.hasNextBinding(QueryIteratorWrapper.java:38) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.iterator.QueryIteratorBase.hasNext(QueryIteratorBase.java:116) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.exec.RowSetStream.hasNext(RowSetStream.java:47) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.sparql.engine.ResultSetStream.hasNext(ResultSetStream.java:81) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeQuery(SPARQLQueryProcessor.java:380) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:279) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeWithParameter(SPARQLQueryProcessor.java:224) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:209) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:58) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execPost(SPARQLQueryProcessor.java:84) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.ActionProcessor.process(ActionProcessor.java:34) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.ActionBase.process(ActionBase.java:54) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.ActionExecLib.execActionSub(ActionExecLib.java:124) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.servlets.ActionExecLib.execAction(ActionExecLib.java:98) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.server.Dispatcher.dispatchAction(Dispatcher.java:164) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.server.Dispatcher.process(Dispatcher.java:156) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
at org.apache.jena.fuseki.server.Dispatcher.dispatch(Dispatcher.java:83) ~[fuseki-server.jar:4.7.0-SNAPSHOT]
Steps to reproduce
- unzip attachment
sparql --data issue_1634.ttl --query issue_1634.rq
Are you interested in making a pull request?
No response
Issue Analytics
- State:
- Created 10 months ago
- Comments:9 (9 by maintainers)
Top GitHub Comments
This case fails:
Fortunately it turns up quickly by doing a N^3 expansion and testing 3 elements (there are over a billion cases to try - it is taking a while.)
There are lots of failures. A rate of 3%-4% so far.
So the SPARQL specification section 18.2.5 says that
ORDER BY
is applied prior toDISTINCT
BUT ARQ includes an optimiser that swaps the ordering i.e. applies
DISTINCT
first in cases where all the variables in theORDER BY
are also projected. This is semantically equivalent and is shown to substantially increase performance because you throw out all the non-distinct solutions prior to ordering. So in that case every row is guaranteed to be unique and you can’t ever get any unstable sorting results as every row is different (at least for your dataset) i.e. every row will compare differently to every other row even if individual ordering expressions might produce equal values.If you want to confirm that this is indeed the case you can disable that optimisation via the
--set arq:optOrderByDistinctApplication=false
optionI suspect what happens in the failing case is that you have two solutions with the custom datatype literals present and we aren’t guaranteeing to provide a stable sort (see PR #1406 that Andy referenced) over those.