question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Poor performance when parsing huge literal in query (e.g. 100MB)

See original GitHub issue

The cause seems to be https://github.com/javacc/javacc/issues/72

We encountered this issue when a SPARQL SERVICE clause was sending a large-ish Geometry literal of USA to Fuseki. It stalls forever trying to parse the query.

Ideally, the buffer would expand exponentially or there is an alternative PR linked in the javacc issue. Currently, the parsing buffer is apparently grown in steps of 2KiB

jstack:
"qtp771666241-136" #136 prio=5 os_prio=0 cpu=13385,35ms elapsed=5538,68s tid=0x00007fa188007800 nid=0x15730d runnable  [0x00007fa1341f9000]
   java.lang.Thread.State: RUNNABLE
	at org.apache.jena.sparql.lang.arq.SimpleCharStream.ExpandBuff(SimpleCharStream.java:42)
	at org.apache.jena.sparql.lang.arq.SimpleCharStream.FillBuff(SimpleCharStream.java:103)
	at org.apache.jena.sparql.lang.arq.SimpleCharStream.readChar(SimpleCharStream.java:197)
	at org.apache.jena.sparql.lang.arq.ARQParserTokenManager.jjMoveNfa_0(ARQParserTokenManager.java:4369)
	at org.apache.jena.sparql.lang.arq.ARQParserTokenManager.jjMoveStringLiteralDfa0_0(ARQParserTokenManager.java:211)
	at org.apache.jena.sparql.lang.arq.ARQParserTokenManager.getNextToken(ARQParserTokenManager.java:4793)
	at org.apache.jena.sparql.lang.arq.ARQParser.jj_ntk_f(ARQParser.java:8162)
	at org.apache.jena.sparql.lang.arq.ARQParser.PathElt(ARQParser.java:3603)
	at org.apache.jena.sparql.lang.arq.ARQParser.PathEltOrInverse(ARQParser.java:3635)
	at org.apache.jena.sparql.lang.arq.ARQParser.PathSequence(ARQParser.java:3565)
	at org.apache.jena.sparql.lang.arq.ARQParser.PathAlternative(ARQParser.java:3544)
	at org.apache.jena.sparql.lang.arq.ARQParser.Path(ARQParser.java:3538)
	at org.apache.jena.sparql.lang.arq.ARQParser.VerbPath(ARQParser.java:3493)
	at org.apache.jena.sparql.lang.arq.ARQParser.PropertyListPathNotEmpty(ARQParser.java:3418)
	at org.apache.jena.sparql.lang.arq.ARQParser.TriplesSameSubjectPath(ARQParser.java:3365)
	at org.apache.jena.sparql.lang.arq.ARQParser.TriplesBlock(ARQParser.java:2512)
	at org.apache.jena.sparql.lang.arq.ARQParser.GroupGraphPatternSub(ARQParser.java:2425)
	at org.apache.jena.sparql.lang.arq.ARQParser.GroupGraphPattern(ARQParser.java:2387)
	at org.apache.jena.sparql.lang.arq.ARQParser.WhereClause(ARQParser.java:858)
	at org.apache.jena.sparql.lang.arq.ARQParser.SelectQuery(ARQParser.java:137)
	at org.apache.jena.sparql.lang.arq.ARQParser.Query(ARQParser.java:31)
	at org.apache.jena.sparql.lang.arq.ARQParser.QueryUnit(ARQParser.java:22)
	at org.apache.jena.sparql.lang.ParserARQ$1.exec(ParserARQ.java:48)
	at org.apache.jena.sparql.lang.ParserARQ.perform(ParserARQ.java:95)
	at org.apache.jena.sparql.lang.ParserARQ.parse$(ParserARQ.java:52)
	at org.apache.jena.sparql.lang.SPARQLParser.parse(SPARQLParser.java:33)
	at org.apache.jena.query.QueryFactory.parse(QueryFactory.java:144)
	at org.apache.jena.query.QueryFactory.create(QueryFactory.java:83)
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:251)
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.executeBody(SPARQLQueryProcessor.java:234)
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execute(SPARQLQueryProcessor.java:213)
	at org.apache.jena.fuseki.servlets.ActionService.executeLifecycle(ActionService.java:58)
	at org.apache.jena.fuseki.servlets.SPARQLQueryProcessor.execPost(SPARQLQueryProcessor.java:83)
	at org.apache.jena.fuseki.servlets.ActionProcessor.process(ActionProcessor.java:34)
	at org.apache.jena.fuseki.servlets.ActionBase.process(ActionBase.java:55)
	at org.apache.jena.fuseki.servlets.ActionExecLib.execActionSub(ActionExecLib.java:125)
	at org.apache.jena.fuseki.servlets.ActionExecLib.execAction(ActionExecLib.java:99)
	at org.apache.jena.fuseki.server.Dispatcher.dispatchAction(Dispatcher.java:164)
	at org.apache.jena.fuseki.server.Dispatcher.process(Dispatcher.java:156)
	at org.apache.jena.fuseki.server.Dispatcher.dispatch(Dispatcher.java:83)
	at org.apache.jena.fuseki.servlets.FusekiFilter.doFilter(FusekiFilter.java:48)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1600)
	at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:61)
	at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
	at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
	at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
	at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
	at org.apache.shiro.web.servlet.AdviceFilter.executeChain(AdviceFilter.java:108)
	at org.apache.shiro.web.servlet.AdviceFilter.doFilterInternal(AdviceFilter.java:137)
	at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
	at org.apache.shiro.web.servlet.ProxiedFilterChain.doFilter(ProxiedFilterChain.java:66)
	at org.apache.shiro.web.servlet.AbstractShiroFilter.executeChain(AbstractShiroFilter.java:450)
	at org.apache.shiro.web.servlet.AbstractShiroFilter$1.call(AbstractShiroFilter.java:365)
	at org.apache.shiro.subject.support.SubjectCallable.doCall(SubjectCallable.java:90)
	at org.apache.shiro.subject.support.SubjectCallable.call(SubjectCallable.java:83)
	at org.apache.shiro.subject.support.DelegatingSubject.execute(DelegatingSubject.java:387)
	at org.apache.shiro.web.servlet.AbstractShiroFilter.doFilterInternal(AbstractShiroFilter.java:362)
	at org.apache.shiro.web.servlet.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:125)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:202)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1600)
	at org.apache.jena.fuseki.servlets.CrossOriginFilter.handle(CrossOriginFilter.java:284)
	at org.apache.jena.fuseki.servlets.CrossOriginFilter.doFilter(CrossOriginFilter.java:247)
	at org.eclipse.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:210)
	at org.eclipse.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1600)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:506)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:131)
	at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:578)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:223)
	at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:1571)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:221)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1378)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:176)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:463)
	at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:1544)
	at org.eclipse.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:174)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1300)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:129)
	at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:717)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:122)
	at org.eclipse.jetty.server.Server.handle(Server.java:562)
	at org.eclipse.jetty.server.HttpChannel.lambda$handle$0(HttpChannel.java:505)
	at org.eclipse.jetty.server.HttpChannel$$Lambda$636/0x000000084084d040.dispatch(Unknown Source)
	at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:762)
	at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:497)
	at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:282)
	at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:319)
	at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:100)
	at org.eclipse.jetty.io.SelectableChannelEndPoint$1.run(SelectableChannelEndPoint.java:53)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.runTask(AdaptiveExecutionStrategy.java:412)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.consumeTask(AdaptiveExecutionStrategy.java:381)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.tryProduce(AdaptiveExecutionStrategy.java:268)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy.lambda$new$0(AdaptiveExecutionStrategy.java:138)
	at org.eclipse.jetty.util.thread.strategy.AdaptiveExecutionStrategy$$Lambda$624/0x0000000840830c40.run(Unknown Source)
	at org.eclipse.jetty.util.thread.ReservedThreadExecutor$ReservedThread.run(ReservedThreadExecutor.java:407)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:894)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$Runner.run(QueuedThreadPool.java:1038)
	at java.lang.Thread.run(java.base@11.0.15/Thread.java:829)

query is something simple as

{ ?c ^<http://www.opengis.net/ont/geosparql#sfContains> "<?xml version=\"1.0\" encoding=\"UTF-8\"?><gml:MultiSurface xmlns:gml=\"http://www.opengis.net/ont/gml\" gml:id=\"g2015_2014_0.104.wkb_geometry\" srsDimension=\"2\" srsName=\"urn:ogc:def:crs:EPSG::3857\"><gml:surfaceMember><gml:Polygon gml:id=\"g2015_2014_0.104.wkb_geometry.1\"><gml:exterior><gml:LinearRing><gml:posList>HUGE POS LIST</gml:posList></gml:LinearRing></gml:exterior></gml:Polygon></gml:surfaceMember></gml:MultiSurface>"^^<http://www.opengis.net/ont/geosparql#gmlLiteral> }

automatically injected from a service clause

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:31 (20 by maintainers)

github_iconTop GitHub Comments

1reaction
afscommented, May 20, 2022

Does quite explain why there is a 100Mb literal in a query but no matter.

SPARQL parsing is central so any changes need to be done carefully, and be proven and mature. Like javacc, I’m thinking about unforeseen consequences.

The parser to use is (surprise!) controlled by a registry SPARQLParserRegistry so extension code can change the parser for an experimental one.

Also –

The jaavcc issue suggests a different approach which is also more efficient - lexical states and lexical actions. The string for token image can be created directly without going through the javacc buffering.

0reactions
new-javacccommented, Jul 9, 2022

Also if your parser can receive the whole input as a string, you can just use a SimpleCharStream with bufffer size as the length of the string itself and instantiate it with a StringReader. Like:

SimpleCharStream simpleCharStream = new SimpleCharStream(new StringReader(input), input.length(), 1, 1)

Which makes sure it will never call ExpandBuf!

So if the parser is sitting in a service, make it STATIC=false and use one parser per request with this kind of consturctor so you don’t need to worry about expand buf or memory management of the parser.

Read more comments on GitHub >

github_iconTop Results From Across the Web

14 Best Practices to Tune BigQuery SQL Performance
Querying a huge dataset is a pain because it hogs resources and can be extremely slow. It wasn't uncommon to find databases that...
Read more >
Can MySQL reasonably perform queries on billions of rows?
MySQL performance with BIGINT fields in an indexed column is ridiculously horrible compared to INT. I made the mistake of doing this once...
Read more >
SQL Server Query: Fast with Literal but Slow with Variable
I've checked the 2 query plans and the first query is performing a Clustered index seek on the main table returning 1 record...
Read more >
a SQL query performance killer – the basics - SQLShack
Poor query design is one of the top SQL Server performance killers. ... required to parse, compile, and execute a SELECT statement.
Read more >
Issues · apache/jena - GitHub
CONSTRUCT query returns N-Triples when UI graph results choice is "Turtle". bug Fuseki ... Poor performance when parsing huge literal in query (e.g....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found