question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement fast Cython Series iterator, for speeding up DataFrame.apply

See original GitHub issue

Having tons of calls to Series.__new__ seriously degrades performance because most of the logic isn’t necessary. Could play tricks in Cython with the data pointers to avoid this.

Issue Analytics

  • State:closed
  • Created 12 years ago
  • Comments:7 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
wesmcommented, Nov 13, 2011

OK I made some further tweaks and things so apply actually beats apply_along_axis quite a bit in the axis=1 case with your example (most of the time is spent calling unique in axis=0 case):

In [6]: timeit data.apply(fn, axis=1, raw=True)
1 loops, best of 3: 288 ms per loop

In [7]: timeit data.apply(fn, axis=0, raw=True)
10 loops, best of 3: 82 ms per loop

In [8]: timeit np.apply_along_axis(fn, 1, data.values)
1 loops, best of 3: 518 ms per loop

In [9]: timeit np.apply_along_axis(fn, 0, data.values)
10 loops, best of 3: 82.7 ms per loop
0reactions
natekuppcommented, Nov 14, 2011

Thanks Wes!

Read more comments on GitHub >

github_iconTop Results From Across the Web

How To Make Your Pandas Loop 71803 Times Faster
Looping through Pandas DataFrames can be very slow — I will show you some very fast options. If you use Python and Pandas...
Read more >
Enhancing performance — pandas 1.5.2 documentation
In this part of the tutorial, we will investigate how to speed up certain functions operating on pandas DataFrame using three different techniques:...
Read more >
Fast, Flexible, Easy and Intuitive: How to Speed Up Your ...
Use .iterrows() : iterate over DataFrame rows as (index, pd.Series ) pairs. While a Pandas Series is a flexible data structure, ...
Read more >
How to speed up pandas with cython (or numpy)
If you're just trying to do it faster and not specifically using cython, I'd just do it in plain numpy (about 50x faster)....
Read more >
Pandas Iterate Over Rows – 5 Methods - Data Independent
Pandas Iterate Over Rows - 5 different ways to iterate over data in your Pandas DataFrame. Pick the fastest one for your use...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found