Slow loading of data with `Series.new()`
See original GitHub issueLoading 100M numbers as Int32Series withSeries.new()
on an NVIDIA A10 (PCIe Gen4: 64 GB/s) takes approximately 5s. This translates into a theoretical bandwidth of 80MB/s, which is 800 times less than PCIe Gen4’s bandwidth. Is that to be expected? And if so, are there any workarounds?
As a point of comparison, a 93MB Parquet file on local NVMe storage with approximately 1GB of uncompressed data loads in about 320ms (3.1GB/s) with DataFrame.readParquet()
.
Another point of comparison would be the loading of 2 columns of 100M Int32 values from an Apache Arrow Table with DataFrame.fromArrow(Arrow.tableToIPC(arrowTable))
, which takes 313ms (2.5GB/s). The IPC to DF loading takes 71ms out of these 313ms (11GB/s).
Issue Analytics
- State:
- Created a year ago
- Comments:10
Top Results From Across the Web
50 times faster data loading for Pandas: no problem
Doing statistics on an array of lists is again horribly slow. Transforming it to a regular table seems like a good bet, because...
Read more >️ Load the same CSV file 10X times faster and with 10X less ...
Loading Data in Chunks: Memory Issues in pandas read_csv() are there for a long time. So one of the best workarounds to load...
Read more >HighCharts is slow to load data when building multiple charts
The first is the main chart with 2 series and the second, is more of a 'thumbnail' chart and has 4 series. I...
Read more >Data loading too slowly | ThoughtSpot Software
If your data is loading slowly, there are a few things you can do to fix it. Some tables may take an unusually...
Read more >Speed up File Loading - Data Analysis & Processing with ...
In this lesson, we show how to speed up file loading. You may notice one thing, if you load a large CSV file...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@ghalimi How are you constructing the Series? The slowest path is passing a JS Array of numbers, as we have to loop through and copy each value in C++. It will do a block H to D copy if you pass it an
Int32Array
, but the performance of the copy depends on certain factors.Take the following example (run on my Turing RTX8000):
~1.37s to copy 3.81Gb is ~2.72Gb/s, close to what you’re seeing with Parquet and Arrow. This is because they’re all copying from pageable host memory, essentially the slowest kind of HtoD copy in CUDA.
If possible, one workaround is to use pinned host memory (the bindings for which are in
@rapidsai/cuda
):The above equates to 43-45Gb/s, saturating the full PCI-E 3 bandwidth of my Turing cards.
Great! That’s going to save us a ton of headache…