question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Expand implementation for DataFrame constructor to make possible construct from dictionaries with Modin entities as values

See original GitHub issue

Describe the problem

We should expand implementation of DataFrame constructor to make possible to create Modin DataFrame from dictionaries with Modin Series as dict values with fast way. For now we have the follow warning:

UserWarning: Distributing <class 'dict'> object. This may take some time.
from time import time as timer

import numpy as np
# import pandas as pd
import modin.pandas as pd
import ray
ray.init()

nrows = 1000_000_000
df = pd.DataFrame({"a": np.random.rand(nrows), "b": np.random.rand(nrows)})

t = timer()
df2 = pd.DataFrame({"c": df.a})
print(f'df creation time: {timer() - t} s')

The result on 112 CPUs, Ray engine:

df creation time: 3.937314748764038 s   # Pandas is used
df creation time: 24.079696655273438 s # Modin is used

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
prutskovcommented, Feb 24, 2022

What about cpu count? Engine?

The info related of cpu count was added in the PR description. The execution engine you can see in the reproducer - Ray.

0reactions
dchigarevcommented, Nov 15, 2022

The #5193 introduces a fast way only for cases when all of the dictionary values are modin Series’s. Thus reopening the issue to indicate that the implementation for other cases is still missing.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Stuck at this warning for >20 min: `UserWarning: Distributing ...
Expand implementation for DataFrame constructor to make possible construct from dictionaries with Modin entities as values #4263.
Read more >
How To Create a Pandas Dataframe from a Dictionary
Here we construct a Pandas dataframe from a dictionary. We use the Pandas constructor, since it can handle different types of data structures....
Read more >
How to create a Pandas Dataframe in Python
In Pandas, DataFrame is the primary data structures to hold tabular data. You can create it using the DataFrame constructor pandas.
Read more >
Construct pandas DataFrame from items in nested dictionary
Specifically, my question is whether there exists a way to to help the DataFrame constructor understand that the series should be built from...
Read more >
pd.DataFrame supported APIs - Modin
DataFrame method pandas Doc link Implemented? (Y/N/P/D) T T Y abs abs Y add add Y
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found