question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

`computeTruncatedLength` may cause 20% table scan slow down for Raptor (or Hive)

See original GitHub issue

I’m looking into raptor performance recently. It turns out computeTruncatedLength may eat 20% CPU (from my local benchmark). This function is a sanity check to make sure unicode codepoints are valid otherwise truncate. Though it is necessary, no sure if there is other way to avoid such high overhead.

Benchmark: https://github.com/highker/presto/commit/4b74a468f3b0d60799603a551e6d8fe0eb7b531b

Results:

without computeTruncatedLength: 569.7778528263376 MB/s with computeTruncatedLength: 441.1912459425512 MB/s

The table (orc format, single file, 7.5GB, 150M roles) I used for this benchmark is a materialized tpch table with a varchar column.

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Comments:5 (5 by maintainers)

github_iconTop GitHub Comments

1reaction
daincommented, Oct 11, 2018

Does the benchmark contain any data with multibyte characters? If not, I would expect the whole thing to generate assembly with just validation of the assumption and no real byte counting due to this check https://github.com/prestodb/presto/blob/master/presto-spi/src/main/java/com/facebook/presto/spi/type/Varchars.java#L83 . If that is not happening, I’d look into restructuring the code so that the inlining happens, or hoist that check closer to the main loop so that the common path avoids that call entirely.

0reactions
highkercommented, Oct 11, 2018

Aha~ Works very well~~~~~~

Read more comments on GitHub >

github_iconTop Results From Across the Web

Fewer Accidental Full Table Scans Brought to You by Apache ...
The problem arises when you want to query a particular set of days within a month (example below).
Read more >
Can bad table-scan queries slow down nice index-covered ...
I can definitely say "Yes, your bad query can impact Ivan's query" due to resource competition at instance level.
Read more >
Truncate Table Operations in SQL Server - SQLShack
Truncating a table is removing all the records in an entire table or a table partition.
Read more >
0xdf hacks stuff | CTF solutions, malware analysis, home lab ...
I'll exploit a directory traversal to read outside the current directory, and find a password that can be used to access the system....
Read more >
Avoiding Table Scans - Oracle Help Center
A table scan is the reading of every row in a table and is caused by queries that don't properly use indexes. Table...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found