Swifter using only single core
See original GitHub issueI am applying swifter to a function which takes several values apart from datetime
variable.
After running the code I saw it using only a single core (available 6 cores). The data is of size 476k rows. With a single core, it takes about 7.5 minutes.
I added a set_npartitions(16)
it improved the processing time to 3.5 minutes but still using a single core.
Any reason why it can’t use all the cores?
Issue Analytics
- State:
- Created 5 years ago
- Comments:13 (7 by maintainers)
Top Results From Across the Web
Not able to parallelize pandas apply using swifter
I have a 16 core processor and I do not see all the cores being utilized as I see only 1 core is...
Read more >Add this single word to make your Pandas Apply faster - MLWhiz
This post is about using the computing power we have at hand and applying it to Pandas DataFrames using Swifter.
Read more >Swifter — automatically efficient pandas apply operations
An introduction to swiftapply, a generalized method for easily and efficiently applying any function to a pandas dataframe or series.
Read more >Speeding up Pandas apply functions using Swifter - KODEY
However, when you get to work with really huge datasets, it just can't hack it – the Pandas apply function runs on a...
Read more >How to speed up Pandas? - Bartosz Mikulski
The Pandas library uses only one core to run the operations, so there is a tremendous opportunity to speed it up even if...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I did a test with a different function which in turn calls several functions. The performance on this got even worse. I think this something to do with the function itself. You could give a try with
solar.get_altitude()
. The size of the sample is about half a million rows. Unfortunately, it is taking about8.5 mins
at1030it/s.
It seems the size of the sample and the number of internal calls has an effect.Update: Surprisingly
allow_dask_on_strings(enable=True)
solved the problem.Closing this issue because it seems resolved. Please feel free to re-open if you feel otherwise.