Poor performance when parsing huge literal in query (e.g. 100MB)
See original GitHub issueThe cause seems to be https://github.com/javacc/javacc/issues/72
We encountered this issue when a SPARQL SERVICE clause was sending a large-ish Geometry literal of USA to Fuseki. It stalls forever trying to parse the query.
Ideally, the buffer would expand exponentially or there is an alternative PR linked in the javacc issue. Currently, the parsing buffer is apparently grown in steps of 2KiB
jstack:
"qtp771666241-136" #136 prio=5 os_prio=0 cpu=13385,35ms elapsed=5538,68s tid=0x00007fa188007800 nid=0x15730d runnable [0x00007fa1341f9000]
java.lang.Thread.State: RUNNABLE
at org.apache.jena.sparql.lang.arq.SimpleCharStream.ExpandBuff(SimpleCharStream.java:42)
at org.apache.jena.sparql.lang.arq.SimpleCharStream.FillBuff(SimpleCharStream.java:103)
at org.apache.jena.sparql.lang.arq.SimpleCharStream.readChar(SimpleCharStream.java:197)
at org.apache.jena.sparql.lang.arq.ARQParserTokenManager.jjMoveNfa_0(ARQParserTokenManager.java:4369)
at org.apache.jena.sparql.lang.arq.ARQParserTokenManager.jjMoveStringLiteralDfa0_0(ARQParserTokenManager.java:211)
at org.apache.jena.sparql.lang.arq.ARQParserTokenManager.getNextToken(ARQParserTokenManager.java:4793)
at org.apache.jena.sparql.lang.arq.ARQParser.jj_ntk_f(ARQParser.java:8162)
at org.apache.jena.sparql.lang.arq.ARQParser.PathElt(ARQParser.java:3603)
at org.apache.jena.sparql.lang.arq.ARQParser.PathEltOrInverse(ARQParser.java:3635)
at org.apache.jena.sparql.lang.arq.ARQParser.PathSequence(ARQParser.java:3565)
at org.apache.jena.sparql.lang.arq.ARQParser.PathAlternative(ARQParser.java:3544)
at org.apache.jena.sparql.lang.arq.ARQParser.Path(ARQParser.java:3538)
at org.apache.jena.sparql.lang.arq.ARQParser.VerbPath(ARQParser.java:3493)
at org.apache.jena.sparql.lang.arq.ARQParser.PropertyListPathNotEmpty(ARQParser.java:3418)
at org.apache.jena.sparql.lang.arq.ARQParser.TriplesSameSubjectPath(ARQParser.java:3365)
at org.apache.jena.sparql.lang.arq.ARQParser.TriplesBlock(ARQParser.java:2512)
at org.apache.jena.sparql.lang.arq.ARQParser.GroupGraphPatternSub(ARQParser.java:2425)
at org.apache.jena.sparql.lang.arq.ARQParser.GroupGraphPattern(ARQParser.java:2387)
at org.apache.jena.sparql.lang.arq.ARQParser.WhereClause(ARQParser.java:858)
at org.apache.jena.sparql.lang.arq.ARQParser.SelectQuery(ARQParser.java:137)
at org.apache.jena.sparql.lang.arq.ARQParser.Query(ARQParser.java:31)
at org.apache.jena.sparql.lang.arq.ARQParser.QueryUnit(ARQParser.java:22)
at org.apache.jena.sparql.lang.ParserARQ$1.exec(ParserARQ.java:48)
at org.apache.jena.sparql.lang.ParserARQ.perform(ParserARQ.java:95)
at org.apache.jena.sparql.lang.ParserARQ.parse$(ParserARQ.java:52)
at org.apache.jena.sparql.lang.SPARQLParser.parse(SPARQLParser.java:33)
at org.apache.jena.query.QueryFactory.parse(QueryFactory.java:144)
at org.apache.jena.query.QueryFactory.create(QueryFactory.java:83)
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:251)
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeBody(SPARQLQueryProcessor.java:234)
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:213)
at org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:58)
at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execPost(SPARQLQueryProcessor.java:83)
at org.apache.jena.fuseki.servlets.ActionProcessor.process(ActionProcessor.java:34)
at org.apache.jena.fuseki.servlets.ActionBase.process(ActionBase.java:55)
at org.apache.jena.fuseki.servlets.ActionExecLib.execActionSub(ActionExecLib.java:125)
at org.apache.jena.fuseki.servlets.ActionExecLib.execAction(ActionExecLib.java:99)
at org.apache.jena.fuseki.server.Dispatcher.dispatchAction(Dispatcher.java:164)
at org.apache.jena.fuseki.server.Dispatcher.process(Dispatcher.java:156)
at org.apache.jena.fuseki.server.Dispatcher.dispatch(Dispatcher.java:83)
at org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:48)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1600)
at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:450)
at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1600)
at org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:284)
at org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:247)
at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210)
at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1600)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571)
at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1378)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:463)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1544)
at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1300)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:717)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
at org.eclipse.jetty.server.Server.handle(Server.java:562)
at org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:505)
at org.eclipse.jetty.server.HttpChannel$$Lambda$636/0x000000084084d040.dispatch(Unknown Source)
at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:762)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:497)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:319)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:412)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:381)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy$$Lambda$624/0x0000000840830c40.run(Unknown Source)
at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
at java.lang.Thread.run(java.base@11.0.15/Thread.java:829)
query is something simple as
{ ?c ^<http://www.opengis.net/ont/geosparql#sfContains> "<?xml version=\"1.0\" encoding=\"UTF-8\"?><gml:MultiSurface xmlns:gml=\"http://www.opengis.net/ont/gml\" gml:id=\"g2015_2014_0.104.wkb_geometry\" srsDimension=\"2\" srsName=\"urn:ogc:def:crs:EPSG::3857\"><gml:surfaceMember><gml:Polygon gml:id=\"g2015_2014_0.104.wkb_geometry.1\"><gml:exterior><gml:LinearRing><gml:posList>HUGE POS LIST</gml:posList></gml:LinearRing></gml:exterior></gml:Polygon></gml:surfaceMember></gml:MultiSurface>"^^<http://www.opengis.net/ont/geosparql#gmlLiteral> }
automatically injected from a service clause
Issue Analytics
- State:
- Created a year ago
- Comments:31 (20 by maintainers)
Top Results From Across the Web
14 Best Practices to Tune BigQuery SQL Performance
Querying a huge dataset is a pain because it hogs resources and can be extremely slow. It wasn't uncommon to find databases that...
Read more >Can MySQL reasonably perform queries on billions of rows?
MySQL performance with BIGINT fields in an indexed column is ridiculously horrible compared to INT. I made the mistake of doing this once...
Read more >SQL Server Query: Fast with Literal but Slow with Variable
I've checked the 2 query plans and the first query is performing a Clustered index seek on the main table returning 1 record...
Read more >a SQL query performance killer – the basics - SQLShack
Poor query design is one of the top SQL Server performance killers. ... required to parse, compile, and execute a SELECT statement.
Read more >Issues · apache/jena - GitHub
CONSTRUCT query returns N-Triples when UI graph results choice is "Turtle". bug Fuseki ... Poor performance when parsing huge literal in query (e.g....
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Does quite explain why there is a 100Mb literal in a query but no matter.
SPARQL parsing is central so any changes need to be done carefully, and be proven and mature. Like javacc, I’m thinking about unforeseen consequences.
The parser to use is (surprise!) controlled by a registry
SPARQLParserRegistry
so extension code can change the parser for an experimental one.Also –
The jaavcc issue suggests a different approach which is also more efficient - lexical states and lexical actions. The string for token image can be created directly without going through the javacc buffering.
Also if your parser can receive the whole input as a string, you can just use a SimpleCharStream with bufffer size as the length of the string itself and instantiate it with a StringReader. Like:
SimpleCharStream simpleCharStream = new SimpleCharStream(new StringReader(input), input.length(), 1, 1)
Which makes sure it will never call ExpandBuf!
So if the parser is sitting in a service, make it STATIC=false and use one parser per request with this kind of consturctor so you don’t need to worry about expand buf or memory management of the parser.