Increase table loading speed possible?
See original GitHub issueHi,
We are trying to load this table into TableSaw.
We downloaded above file onto a SSD disk and are running this code:
final String tableSource = "/Users/tischer/Desktop/default.tsv";
System.out.println("Table source: " + tableSource );
builder = CsvReadOptions.builder( new File( tableSource ) ).separator( '\t' ).missingValueIndicator( "na", "none", "nan" );
start = System.currentTimeMillis();
Table.read().usingOptions( builder );
System.out.println("Build Table from File [ms]: " + ( System.currentTimeMillis() - start ));
This takes around 1600 ms.
Do you have any suggestions for how to potentially speed this up? We are also open to storing the table in another file format if that would help.
Thank you very much!
Issue Analytics
- State:
- Created a year ago
- Comments:7 (1 by maintainers)
Top Results From Across the Web
Posts Table Pro and Performance - How to Maximize Speed
Read our top tips on how to speed up your WordPress dynamic tables and get it loading ultra-fast. A big table doesn't have...
Read more >Improving Data Loading Performance
You can improve performance when loading data into the data target. Keep the following suggestions in mind. Prior to load options. Before you...
Read more >9 Quick Ways to Improve Page Loading Speed - HubSpot Blog
Therefore, one of the easiest ways to increase page loading speeds is to compress and optimize your images. This can include changing their...
Read more >Speeding up Datatables — is a 9 second load time reasonable?
This FAQ discusses the options to help improve the speed to display the table. A quick thing to try is deferRender . If...
Read more >Hypercharge Oracle data loading performance: speed tips
Use a large blocksize - Data loads onto 32k blocksizes will run far faster because Oracle will be able to insert more rows...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Yes, it’s the JVM loading the code from the libraries, and the JIT compiler optimizing it. As far as I know there is no easy way to speed this up (and it’s already highly optimized). This is why code performance is a particularly tricky topic in Java…
I gave a try to your file in parquet.
It looks much faster in parquet, but the JVM is already warm when I read the parquet back.
While on a warm JVM reading the parquet file is consistently faster (because it’s a binary format), on a cold JVM it is actually slower, probably because there is more code to load and/or optimize.
You can see code loading taking its toll on the first run if you compare the parquet reader log to the externally timed operation:
Context is very important for performance considerations. Hope this helps.
I made another test, where I just read a table with only two rows!
This is extreme 😉
Is that the JIT building all the code for parsing tables during the first go?
If so, do you have any experience with multi-threading in that regard? To me this suggests that it could in fact be better to read many tables rather sequentially to give the JIT a chance to compile the code?!