question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Slow Performance of Swifter for Text Preprocessing

See original GitHub issue

Hi @jmcarpenter2,

Dear Swifter Folks,

Recently, i found the speed when using swifter is 5-10x slower than using vanilla pandas apply for case that the process is not vectorized (my case is doing text preprocessing).

The experiment is like this:


import pandas as pd
import swifter

def clean_text(text):
    text = text.strip()
    text = text.replace(' ', '_')
    return text

N_rows = 7000000
df_data = pd.DataFrame([["i want to break free"]] * N_rows, columns=["text"])

%time df_data['text'] = df_data['text'].swifter.apply(clean_text)

%time df_data['text'] = df_data['text'].apply(clean_text)

Is it expected? let’s have a discussion to make sure i’m not missing something. Thank you!

Issue Analytics

  • State:closed
  • Created 5 years ago
  • Reactions:4
  • Comments:26 (10 by maintainers)

github_iconTop GitHub Comments

10reactions
jmcarpenter2commented, Apr 25, 2019

For anyone reading this issue –

If you are doing processing on text data and want to try to increase speed with swifter, you should try adding allow_dask_on_strings() to your command chain.

For example df.swifter.allow_dask_on_strings().apply(foo) will allow swifter to attempt using dask on your text data, which by default is not allowed.

Please see the discussion above for why this is the default. Long story short: it can actually run slower than a pandas apply.

So if you are experiencing a lack of performance boost from swifter and you have text data in your dataframe, try allow_dask_on_strings(). It is more likely to increase speed if the text column of the dataframe is only used as a lookup rather than be mutated by the function call itself.

3reactions
jaynefluxcommented, Jul 28, 2020

I haven’t been using the library for a while but it’s really great to see things resolved with such awesome bug closing notes. Folks like you make the OSS world the magical place that it is. Thank you so much Jason!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Really Slow Array Performance - Using Swift - Swift Forums
Hey I decided to learn how to make Swift go fast :rocket: I optimised my code with pointers and a lot more, inlining...
Read more >
Vectorized form of cleaning function for NLP - Stack Overflow
I was wondering if there is any way to make a vectorized form of my function or maybe and other way to speed...
Read more >
Text pre-processing: Stop words removal using different libraries
By removing these words, we remove the low-level information from our text in order to give more focus to the important information.
Read more >
SwiftUI TextEditor performance iss… | Apple Developer Forums
I'm currently trying it with a 400K text string and typing into the TextEditor horribly slow. Each letter I type takes seconds to...
Read more >
What is Text Mining? - IBM
Text Mining · Structured data: This data is standardized into a tabular format with numerous rows and columns, making it easier to store...
Read more >

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found