df.loc much slower compared to pandas
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): CentOS Linux 7 (Core)
- Modin version (
modin.__version__
): 0.8.3 - Python version: 3.6.8
- Code we can use to reproduce: %%timeit -r 4 import pandas as pd d = {‘col1’: [1, 2], ‘col2’: [3, 4], ‘col3’: [5,6]} df = pd.DataFrame(d) df = df.set_index([‘col1’, ‘col2’]) df.loc[1]
%%timeit -r 4 import modin.pandas as pd d = {‘col1’: [1, 2], ‘col2’: [3, 4], ‘col3’: [5,6]} df = pd.DataFrame(d) df = df.set_index([‘col1’, ‘col2’]) df.loc[1]
Describe the problem
df.loc is taking much longer to run in Modin pandas as opposed to the vanilla pandas. The timing has been recorded by averaging over 4 runs by running the code in Jupiter notebook using timeit. (refer code snippet for reproducing the code).
Source code / logs
3.41 ms ± 268 µs per loop using vanilla pandas df.loc 29.4 ms ± 514 µs per loop using Modin pandas df.loc
Issue Analytics
- State:
- Created 3 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Pandas dataframe.loc method too slow - python - Stack Overflow
You can access cell values with numpy by converting your dataframe to a numpy array. This method is faster than the .loc method....
Read more >Is Pandas really that slow? - Medium
After understanding Pandas more thoroughly and gaining some experienced I figured out that in most cases, Pandas is anything but slow.
Read more >Poor performance for .loc and .iloc compared to .ix #6683
But it seems the performance of .loc and .iloc is 20-30 times slower than .ix (I am using Pandas 0.13.1) .ix takes 4.54897093773...
Read more >How to Speed Up Your Pandas Code by 10x | Built In
This mean NumPy can be significantly faster than Pandas. Converting a DataFrame from Pandas to NumPy is relatively straightforward.
Read more >Indexing and selecting data — pandas 1.5.2 documentation
pandas aligns all AXES when setting Series and DataFrame from .loc , and .iloc . This will not modify df because the column...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
apply
can be used as an alternative to looping.We are working on a query optimizer that can detect python loops and compose a single call, but it is not finished yet. It is somewhat complicated to perfectly detect because of early loop exits and skipping.
Sounds cool! Thanks.