Getting data out of a dataframe can be slow (toArray, toPairs, etc)
See original GitHub issueHi, this is related to https://github.com/data-forge/data-forge-ts/issues/11. I’m opening a new issue since I don’t have permission to reopen the previous one and I’m not sure if the issue is in toPairs
or the use of fillGaps
and rollingWindow
. The issue is very slow performance with from toPairs
. Copying from my comment in the other issue.
I just pushed up a change to our test repo. The changes are:
- Update package.json to use the most recent version of data-forge
- Slightly change the reported timings to make it more clear where the performance issue happens. Specifically, the slow down looks like it’s coming out of the call to
toPairs()
.
The tests I’m looking at are method-1.js and method-2.js. The only difference between them is:
$ diff method-1.js method-2.js
76c76
< const mySeries = dfWithoutGaps.getSeries('value');
---
> const mySeries = new dataForge.Series(dfWithoutGaps.getSeries('value').toArray());
Output from running the tests:
cberthiaume@slow-lane:~/data-forge-performance-test-issue-11$ node method-1.js
Time to require: 980.9740000000002
Time to create DataFrame and getSeries: 8.874000000000024
Time for rolling window: 0.08599999999978536
Time for toPairs: 2067.8320000000003
cberthiaume@slow-lane:~/data-forge-performance-test-issue-11$ node method-2.js
Time to require: 975.4
Time to create DataFrame and getSeries: 63.06100000000015
Time for rolling window: 0.10500000000001819
Time for toPairs: 17.16599999999994
cberthiaume@slow-lane:~/data-forge-performance-test-issue-11$
The key difference is the huge difference in time to call toPairs()
. Our use case requires us to call toPairs()
and the only way to get acceptable performance when doing that is to recreate the series as you see in the diff above. However, our needs have changed that the slow down required to implement this workaround is becoming a bottleneck. Is there a a better way to get good performance from toPairs()
without using this workaround? Should I open a separate ticket to track this?
Thanks again for all your help.
Issue Analytics
- State:
- Created 5 years ago
- Comments:22 (15 by maintainers)
Top GitHub Comments
Thanks for continuing to give feedback. I’ll look at this soon.
Probably because it’s going through a JavaScript iterator.