Expand implementation for DataFrame constructor to make possible construct from dictionaries with Modin entities as values
See original GitHub issueDescribe the problem
We should expand implementation of DataFrame constructor to make possible to create Modin DataFrame from dictionaries with Modin Series as dict values with fast way. For now we have the follow warning:
UserWarning: Distributing <class 'dict'> object. This may take some time.
from time import time as timer
import numpy as np
# import pandas as pd
import modin.pandas as pd
import ray
ray.init()
nrows = 1000_000_000
df = pd.DataFrame({"a": np.random.rand(nrows), "b": np.random.rand(nrows)})
t = timer()
df2 = pd.DataFrame({"c": df.a})
print(f'df creation time: {timer() - t} s')
The result on 112 CPUs, Ray engine:
df creation time: 3.937314748764038 s # Pandas is used
df creation time: 24.079696655273438 s # Modin is used
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
Top Results From Across the Web
Stuck at this warning for >20 min: `UserWarning: Distributing ...
Expand implementation for DataFrame constructor to make possible construct from dictionaries with Modin entities as values #4263.
Read more >How To Create a Pandas Dataframe from a Dictionary
Here we construct a Pandas dataframe from a dictionary. We use the Pandas constructor, since it can handle different types of data structures....
Read more >How to create a Pandas Dataframe in Python
In Pandas, DataFrame is the primary data structures to hold tabular data. You can create it using the DataFrame constructor pandas.
Read more >Construct pandas DataFrame from items in nested dictionary
Specifically, my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from...
Read more >pd.DataFrame supported APIs - Modin
DataFrame method pandas Doc link Implemented? (Y/N/P/D)
T T Y
abs abs Y
add add Y
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The info related of cpu count was added in the PR description. The execution engine you can see in the reproducer - Ray.
The #5193 introduces a fast way only for cases when all of the dictionary values are modin Series’s. Thus reopening the issue to indicate that the implementation for other cases is still missing.