[Bug] modin-on-ray's unwrap_partitions is 100x slower on Mac than on Windows
See original GitHub issueSearch before asking
- I searched the issues and found no similar issues.
Ray Component
Ray Core
What happened + What you expected to happen
I was doing some development on Modin and wanted to call unwrap_partitions, which uses Ray to materialize the partitions constituting a Modin dataframe. I found that the function call took 122 seconds on my Macbook, but 721 milliseconds on a Windows computer.
Versions / Dependencies
Before running the reproduction script, install Modin with
pip install modin
and download this file.
My Mac is on:
- MacBook Pro (16-inch, 2019), macOS Big Sur 11.5.2
- Python 3.8.8
- Ray 1.8.0
- Modin 0.11.1+37.g0a3acc15
- RAM: 16 GB 2667 MHz DDR4
ray.cluster_resources()
: {‘CPU’: 16.0, ‘object_store_memory’: 3248034201.0, ‘memory’: 6496068404.0, ‘node:127.0.0.1’: 1.0}
The Windows computer is:
- Windows 10 machine with 10 cores
- Python 3.8.5
- Ray 1.7.1
- Modin 0.11.1+45.g41213581
Reproduction script
from modin.distributed.dataframe.pandas import unwrap_partitions import modin.pandas as pd import ray from modin.config import NPartitions fdf = pd.read_csv(“test_700kx256.csv”) %time ray.wait(unwrap_partitions(fdf, axis=0), num_returns=NPartitions.get())
Anything else
Some other Modin operations have been drastically slower on my Mac than on the same Windows machine. I have observed the same problem with other Macs. I have tried to provide a simple, reproducible example here.
Are you willing to submit a PR?
- Yes I am willing to submit a PR!
Issue Analytics
- State:
- Created 2 years ago
- Comments:64 (54 by maintainers)
Haha wow, I didn’t realize this was actually faster on Windows! First time something is faster on Windows I think 😃
@devin-petersohn @mvashishtha are y’all able to reproduce this without a dataset? Or really ideally, simulate modin’s behavior here with just the ray core?
cc @scv119 @rkooo567