question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

df.loc much slower compared to pandas

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux 7 (Core)
  • Modin version (modin.__version__): 0.8.3
  • Python version: 3.6.8
  • Code we can use to reproduce: %%timeit -r 4 import pandas as pd d = {‘col1’: [1, 2], ‘col2’: [3, 4], ‘col3’: [5,6]} df = pd.DataFrame(d) df = df.set_index([‘col1’, ‘col2’]) df.loc[1]

%%timeit -r 4 import modin.pandas as pd d = {‘col1’: [1, 2], ‘col2’: [3, 4], ‘col3’: [5,6]} df = pd.DataFrame(d) df = df.set_index([‘col1’, ‘col2’]) df.loc[1]

Describe the problem

df.loc is taking much longer to run in Modin pandas as opposed to the vanilla pandas. The timing has been recorded by averaging over 4 runs by running the code in Jupiter notebook using timeit. (refer code snippet for reproducing the code).

Source code / logs

3.41 ms ± 268 µs per loop using vanilla pandas df.loc 29.4 ms ± 514 µs per loop using Modin pandas df.loc

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
devin-petersohncommented, Mar 17, 2021

apply can be used as an alternative to looping.

We are working on a query optimizer that can detect python loops and compose a single call, but it is not finished yet. It is somewhat complicated to perfectly detect because of early loop exits and skipping.

0reactions
ayushdascommented, Mar 17, 2021

Sounds cool! Thanks.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Pandas dataframe.loc method too slow - python - Stack Overflow
You can access cell values with numpy by converting your dataframe to a numpy array. This method is faster than the .loc method....
Read more >
Is Pandas really that slow? - Medium
After understanding Pandas more thoroughly and gaining some experienced I figured out that in most cases, Pandas is anything but slow.
Read more >
Poor performance for .loc and .iloc compared to .ix #6683
But it seems the performance of .loc and .iloc is 20-30 times slower than .ix (I am using Pandas 0.13.1) .ix takes 4.54897093773...
Read more >
How to Speed Up Your Pandas Code by 10x | Built In
This mean NumPy can be significantly faster than Pandas. Converting a DataFrame from Pandas to NumPy is relatively straightforward.
Read more >
Indexing and selecting data — pandas 1.5.2 documentation
pandas aligns all AXES when setting Series and DataFrame from .loc , and .iloc . This will not modify df because the column...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found