Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Implement fast Cython Series iterator, for speeding up DataFrame.apply

See original GitHub issue

Having tons of calls to Series.__new__ seriously degrades performance because most of the logic isn’t necessary. Could play tricks in Cython with the data pointers to avoid this.

Issue Analytics

State:
Created 12 years ago
Comments:7 (4 by maintainers)

Top GitHub Comments

1reaction

wesmcommented, Nov 13, 2011

OK I made some further tweaks and things so apply actually beats apply_along_axis quite a bit in the axis=1 case with your example (most of the time is spent calling unique in axis=0 case):

In [6]: timeit data.apply(fn, axis=1, raw=True)
1 loops, best of 3: 288 ms per loop

In [7]: timeit data.apply(fn, axis=0, raw=True)
10 loops, best of 3: 82 ms per loop

In [8]: timeit np.apply_along_axis(fn, 1, data.values)
1 loops, best of 3: 518 ms per loop

In [9]: timeit np.apply_along_axis(fn, 0, data.values)
10 loops, best of 3: 82.7 ms per loop

0reactions

natekuppcommented, Nov 14, 2011

Thanks Wes!

Top Results From Across the Web

How To Make Your Pandas Loop 71803 Times Faster

Looping through Pandas DataFrames can be very slow — I will show you some very fast options. If you use Python and Pandas...

Enhancing performance — pandas 1.5.2 documentation

In this part of the tutorial, we will investigate how to speed up certain functions operating on pandas DataFrame using three different techniques:...

Fast, Flexible, Easy and Intuitive: How to Speed Up Your ...

Use .iterrows() : iterate over DataFrame rows as (index, pd.Series ) pairs. While a Pandas Series is a flexible data structure, ...

How to speed up pandas with cython (or numpy)

If you're just trying to do it faster and not specifically using cython, I'd just do it in plain numpy (about 50x faster)....

Pandas Iterate Over Rows – 5 Methods - Data Independent

Pandas Iterate Over Rows - 5 different ways to iterate over data in your Pandas DataFrame. Pick the fastest one for your use...