question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Increase table loading speed possible?

See original GitHub issue

Hi,

We are trying to load this table into TableSaw.

We downloaded above file onto a SSD disk and are running this code:

final String tableSource = "/Users/tischer/Desktop/default.tsv";
System.out.println("Table source: " + tableSource );
builder = CsvReadOptions.builder( new File( tableSource ) ).separator( '\t' ).missingValueIndicator( "na", "none", "nan" );
start = System.currentTimeMillis();
Table.read().usingOptions( builder );
System.out.println("Build Table from File [ms]: " + ( System.currentTimeMillis() - start ));

This takes around 1600 ms.

Do you have any suggestions for how to potentially speed this up? We are also open to storing the table in another file format if that would help.

Thank you very much!

Issue Analytics

  • State:open
  • Created a year ago
  • Comments:7 (1 by maintainers)

github_iconTop GitHub Comments

2reactions
cclevacommented, Oct 18, 2022

Yes, it’s the JVM loading the code from the libraries, and the JIT compiler optimizing it. As far as I know there is no easy way to speed this up (and it’s already highly optimized). This is why code performance is a particularly tricky topic in Java…

I gave a try to your file in parquet.

  • First step: read the csv, write to parquet, read the parquet file
Build Table from csv [ms]: 1164
Write Table to parquet [ms]: 904
Build Table from parquet [ms]: 181

It looks much faster in parquet, but the JVM is already warm when I read the parquet back.

  • Second step: read the csv / parquet files 4 times in separate programs
Build Table from csv [ms]: 1226
Build Table from csv [ms]: 225
Build Table from csv [ms]: 208
Build Table from csv [ms]: 209

Build Table from parquet [ms]: 1554
Build Table from parquet [ms]: 85
Build Table from parquet [ms]: 68
Build Table from parquet [ms]: 69

While on a warm JVM reading the parquet file is consistently faster (because it’s a binary format), on a cold JVM it is actually slower, probably because there is more code to load and/or optimize.

You can see code loading taking its toll on the first run if you compare the parquet reader log to the externally timed operation:

DEBUG: Finished reading 100541 rows from default.tsv.parquet in 969 ms
Build Table from parquet [ms]: 1554

Context is very important for performance considerations. Hope this helps.

0reactions
tischicommented, Oct 18, 2022

I made another test, where I just read a table with only two rows!

Table source: /Users/tischer/Desktop/default_regions.tsv
Build Table from File [ms]: 1206
Table source: /Users/tischer/Desktop/default_regions.tsv
Build Table from File [ms]: 12

This is extreme 😉

Is that the JIT building all the code for parsing tables during the first go?

If so, do you have any experience with multi-threading in that regard? To me this suggests that it could in fact be better to read many tables rather sequentially to give the JIT a chance to compile the code?!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Posts Table Pro and Performance - How to Maximize Speed
Read our top tips on how to speed up your WordPress dynamic tables and get it loading ultra-fast. A big table doesn't have...
Read more >
Improving Data Loading Performance
You can improve performance when loading data into the data target. Keep the following suggestions in mind. Prior to load options. Before you...
Read more >
9 Quick Ways to Improve Page Loading Speed - HubSpot Blog
Therefore, one of the easiest ways to increase page loading speeds is to compress and optimize your images. This can include changing their...
Read more >
Speeding up Datatables — is a 9 second load time reasonable?
This FAQ discusses the options to help improve the speed to display the table. A quick thing to try is deferRender . If...
Read more >
Hypercharge Oracle data loading performance: speed tips
Use a large blocksize - Data loads onto 32k blocksizes will run far faster because Oracle will be able to insert more rows...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found