pivot on Ray engine is slow
See original GitHub issueSystem information
- Modin version: 643596d5f9e519358fe785ccd081ba05edb624ee
- Exact command to reproduce:
def test2():
import modin.pandas as pd
import pandas
import numpy as np
import numpy.random
from time import time
size = 10**6
df = pandas.DataFrame(numpy.random.choice(pd.date_range(start='1/1/2021', end='3/1/2021'), size=size), columns=["timestamp"])
df["description"] = pandas.Series([f"test_string{x}" for x in range(size)])
df["numbers"] = pandas.Series(numpy.random.choice(np.random.uniform(1, 10, size=(10_000,)), size=size))
modin_df = pd.DataFrame(df)
for _ in range(3):
start = time()
df.pivot(index="description", columns="timestamp", values="numbers")
print(f"pandas time: {time()-start}")
print("\n\n")
for _ in range(3):
start = time()
modin_df.pivot(index="description", columns="timestamp", values="numbers")
print(f"modin time: {time()-start}")
test2()
Describe the problem
Modin(8 cores) 2.5 time slower than Pandas.
Source code / logs
pandas time: 2.3854618072509766
pandas time: 2.334975242614746
pandas time: 2.335522413253784
modin time: 5.825315237045288
modin time: 5.775348424911499
modin time: 5.766143798828125
More actual performance results(for 9782a027568d9ad16bf2c3dea434646cec5e4898):
pandas time: 2.2864537239074707
pandas time: 2.187014579772949
pandas time: 2.144266366958618
modin time: 5.857679843902588
modin time: 5.642812728881836
modin time: 5.5898356437683105
Issue Analytics
- State:
- Created 2 years ago
- Comments:9 (9 by maintainers)
Top Results From Across the Web
Performance tip to speed up slow pivot operations in Power ...
Pivot operations in are a very handy feature in Power Query but they can slow down refresh performance. So with some bittersweet pleasure...
Read more >Custom Columns breaks Pivot Table · Issue #14604 - GitHub
Describe the bug Custom Columns from pre-aggregate added to Pivot Table rows section To Reproduce Custom question > Sample Dataset > Orders ...
Read more >How to resolve solver Pivot error in Ansys Structural ?
I am trying to solve a problem in Ansys Structural. The geometry is composed of 3 main parts - Main thruster engine( cylinder),...
Read more >Ray Tracing Performance Guide - Unreal Engine Documentation
The mesh overlap can cause extremely slow ray traversal in scenes built by piecing different assets together as needed (also called kitbashing).
Read more >CENTER PIVOT IRRIGATION - USDA ARS
However, on center pivots near the pivot where machine movement is slow, not every outlet has a sprinkler installed in order to reduce...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@dchigarev is there easy way to fix
pivot
performance?I have added some blocking calls to
repr()
to make sure we’re not measuring just issuing the jobs, and here’s the run result on my machine (6 workers used):This was measured against fbd1e2a1b91170a3c45ce9565b1051bb2d55e4eb