question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

javacc parser is very slow to parse sql

See original GitHub issue

I executed a batch insert SQL, but the execution was very slow. I took a look at Flame Graph and the sql parsing took most of the time. I looked at the source code of this project, which uses the parser generated by javacc. Will this have a significant impact on performance? I hope someone can help improve it.

my sql like this:

insert into trade_info (id, ....)
values 
(111, ...),
(222, ...),
(333, ...),
...
-- 1000 rows

image

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:10

github_iconTop GitHub Comments

2reactions
holmofycommented, Apr 15, 2021

Sorry, I tested it again and found that using the native batch and using httpClient is similar.

image

For the previous slow of batch insert, the core reason is that the spring framework call connect.getMetaData() while some fields of the inserted data are null. And ClickHouseConnectionImpl.getMetaData() will use LogProxy reflection proxy to record logs.

image

I think the LogProxy can be optimized. There should be many frameworks that call ClickHouseConnectionImpl.getMetaData(). And reflection logging really affects performance.

1reaction
zhicwucommented, Apr 15, 2021

Thanks again for the clarification. So to summarize:

  1. SQL parser’s overhead is too high when dealing with large SQL
  2. getMetaData should be optimized as it slows down JdbcTemplate for null-check - not sure why they use the method so frequent but looks like it can be cached
  3. Batch insertion is inconvenient(compare to JdbcTemplate) and it can be optimized(~10% overhead)

Now batch insertion at this point is based on text-based format, so I expect the performance will be improved after switching to binary format. I’ll address the last two in 0.3.1 release, and the first in 0.4.0.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Understanding JavaParser compared to JavaCC and Eclipse ...
I'm quite new to parsing and struggling to find info on resources regarding comparisons between the main java parsing options. I understand ...
Read more >
Design and Practice of Self-Developed SQL Parser
Many tools, such as ANTLR and JavaCC have been used to generate SQL Parser. However, the values clause will produce too many AST...
Read more >
Implementing High Performance Parsers in Java - InfoQ
Random access parser implementations are often slower than sequential access parsers, because they generally build up some kind of object tree ...
Read more >
JavaCC | The most popular parser generator for use with Java ...
Why are line and column numbers not recorded? Can I process Unicode? The Parser and Lookahead. Where should I draw the line between...
Read more >
CF-3496254 | Tracker
Please parse out /**/ SQL style comments in Query of Queries. ... Rupesh, is the "JavaCC SQL Parser" you mention a third party...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found