Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slow Performance of Swifter for Text Preprocessing

See original GitHub issue

Hi @jmcarpenter2,

Dear Swifter Folks,

Recently, i found the speed when using swifter is 5-10x slower than using vanilla pandas apply for case that the process is not vectorized (my case is doing text preprocessing).

The experiment is like this:


import pandas as pd
import swifter

def clean_text(text):
    text = text.strip()
    text = text.replace(' ', '_')
    return text

N_rows = 7000000
df_data = pd.DataFrame([["i want to break free"]] * N_rows, columns=["text"])

%time df_data['text'] = df_data['text'].swifter.apply(clean_text)

%time df_data['text'] = df_data['text'].apply(clean_text)

Is it expected? let’s have a discussion to make sure i’m not missing something. Thank you!

Issue Analytics

State:
Created 5 years ago
Reactions:4
Comments:26 (10 by maintainers)

Top GitHub Comments

10reactions

jmcarpenter2commented, Apr 25, 2019

For anyone reading this issue –

If you are doing processing on text data and want to try to increase speed with swifter, you should try adding allow_dask_on_strings() to your command chain.

For example df.swifter.allow_dask_on_strings().apply(foo) will allow swifter to attempt using dask on your text data, which by default is not allowed.

Please see the discussion above for why this is the default. Long story short: it can actually run slower than a pandas apply.

So if you are experiencing a lack of performance boost from swifter and you have text data in your dataframe, try allow_dask_on_strings(). It is more likely to increase speed if the text column of the dataframe is only used as a lookup rather than be mutated by the function call itself.

3reactions

jaynefluxcommented, Jul 28, 2020

I haven’t been using the library for a while but it’s really great to see things resolved with such awesome bug closing notes. Folks like you make the OSS world the magical place that it is. Thank you so much Jason!