question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

pivot on Ray engine is slow

See original GitHub issue

System information

  • Modin version: 643596d5f9e519358fe785ccd081ba05edb624ee
  • Exact command to reproduce:
def test2():
    import modin.pandas as pd
    import pandas
    import numpy as np
    import numpy.random
    from time import time

    size = 10**6
    df = pandas.DataFrame(numpy.random.choice(pd.date_range(start='1/1/2021', end='3/1/2021'), size=size), columns=["timestamp"])
    df["description"] = pandas.Series([f"test_string{x}" for x in range(size)])
    df["numbers"] = pandas.Series(numpy.random.choice(np.random.uniform(1, 10, size=(10_000,)), size=size))

    modin_df = pd.DataFrame(df)
    for _ in range(3):
        start = time()
        df.pivot(index="description", columns="timestamp", values="numbers")
        print(f"pandas time: {time()-start}")
    print("\n\n")
    for _ in range(3):
        start = time()
        modin_df.pivot(index="description", columns="timestamp", values="numbers")
        print(f"modin time: {time()-start}")

test2()

Describe the problem

Modin(8 cores) 2.5 time slower than Pandas.

Source code / logs

pandas time: 2.3854618072509766
pandas time: 2.334975242614746
pandas time: 2.335522413253784


modin time: 5.825315237045288
modin time: 5.775348424911499
modin time: 5.766143798828125

More actual performance results(for 9782a027568d9ad16bf2c3dea434646cec5e4898):

pandas time: 2.2864537239074707
pandas time: 2.187014579772949
pandas time: 2.144266366958618



modin time: 5.857679843902588
modin time: 5.642812728881836
modin time: 5.5898356437683105

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
anmyachevcommented, Nov 11, 2021

@dchigarev is there easy way to fix pivot performance?

0reactions
vnlitvinovcommented, Aug 26, 2022

I have added some blocking calls to repr() to make sure we’re not measuring just issuing the jobs, and here’s the run result on my machine (6 workers used):

def test2():
    import modin.pandas as pd
    import modin.config as cfg
    import pandas
    import numpy as np
    import numpy.random
    from time import time

    size = 10**6
    df = pandas.DataFrame(numpy.random.choice(pd.date_range(start='1/1/2021', end='3/1/2021'), size=size), columns=["timestamp"])
    df["description"] = pandas.Series([f"test_string{x}" for x in range(size)])
    df["numbers"] = pandas.Series(numpy.random.choice(np.random.uniform(1, 10, size=(10_000,)), size=size))

    pd.DataFrame(range(cfg.CpuCount.get() * cfg.MinPartitionSize().get())).to_numpy() # init the engine and start all the workers

    modin_df = pd.DataFrame(df)
    for _ in range(3):
        start = time()
        df.pivot(index="description", columns="timestamp", values="numbers")
        print(f"pandas time: {time()-start}")
    print("\n\n")
    repr(modin_df) # to wait till it's complete
    for _ in range(3):
        start = time()
        repr(modin_df.pivot(index="description", columns="timestamp", values="numbers"))
        print(f"modin time: {time()-start}")

test2()

pandas time: 1.4150474071502686
pandas time: 1.4090681076049805
pandas time: 1.4109303951263428

modin time: 5.301079034805298
modin time: 4.768923282623291
modin time: 4.864060163497925

This was measured against fbd1e2a1b91170a3c45ce9565b1051bb2d55e4eb

Read more comments on GitHub >

github_iconTop Results From Across the Web

Performance tip to speed up slow pivot operations in Power ...
Pivot operations in are a very handy feature in Power Query but they can slow down refresh performance. So with some bittersweet pleasure...
Read more >
Custom Columns breaks Pivot Table · Issue #14604 - GitHub
Describe the bug Custom Columns from pre-aggregate added to Pivot Table rows section To Reproduce Custom question > Sample Dataset > Orders ...
Read more >
How to resolve solver Pivot error in Ansys Structural ?
I am trying to solve a problem in Ansys Structural. The geometry is composed of 3 main parts - Main thruster engine( cylinder),...
Read more >
Ray Tracing Performance Guide - Unreal Engine Documentation
The mesh overlap can cause extremely slow ray traversal in scenes built by piecing different assets together as needed (also called kitbashing).
Read more >
CENTER PIVOT IRRIGATION - USDA ARS
However, on center pivots near the pivot where machine movement is slow, not every outlet has a sprinkler installed in order to reduce...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found