ModuleNotFoundError when running modin with dask in AWS environment
See original GitHub issueSystem information
- OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux 4.14.252-131.483.amzn1.x86_64
- Modin version (
modin.__version__
): 0.8.3 - Python version: 3.6.10
- Code we can use to reproduce:
import os
os.environ["MODIN_ENGINE"] = "dask"
from dask.distributed import Client
client = Client('Dask-Scheduler.local-dask:8786') #path to my dask scheduler cluster
import modin.pandas as pd
frame_data = np.random.randint(0, 100, size=(2**10, 2**8))
df = pd.DataFrame(frame_data)
Describe the problem
I’ve created a conda environment with the following requirements, and based on this walkthrough:
conda create --name daskpy36 python=3.6.10 dask=2.14 distributed=2.14 s3fs=0.4.0 dask-glm=0.2.0 cytoolz=0.8.2 pandas=1.0.1 scikit-learn=0.23.2 matplotlib dask-ml=1.6.0 ipykernel -y
With this environment activated, I’ve pip installed modin[dask], and am able to load it properly, however when I try to use the library at all, I get a ModuleNotFoundError, as shown in the logs section.
For some reason it appears that my worker is using /opt/conda/lib/python
for the pickle.loads call in the distributed package, instead of the specified one in my anaconda3/envs folder. This happens no matter what modin API call I make, including pd.read_csv
to pd.dataframe
Source code / logs
ModuleNotFoundError Traceback (most recent call last)
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
343 method = get_real_method(obj, self.print_method)
344 if method is not None:
--> 345 return method()
346 return None
347 else:
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/pandas/dataframe.py in _repr_html_(self)
196 # We use pandas _repr_html_ to get a string of the HTML representation
197 # of the dataframe.
--> 198 result = self._build_repr_df(num_rows, num_cols)._repr_html_()
199 if len(self.index) > num_rows or len(self.columns) > num_cols:
200 # We split so that we insert our correct dataframe dimensions.
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/pandas/base.py in _build_repr_df(self, num_rows, num_cols)
166 else:
167 indexer = row_indexer
--> 168 return self.iloc[indexer]._query_compiler.to_pandas()
169
170 def _update_inplace(self, new_query_compiler):
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/backends/pandas/query_compiler.py in to_pandas(self)
233
234 def to_pandas(self):
--> 235 return self._modin_frame.to_pandas()
236
237 @classmethod
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/base/frame/data.py in to_pandas(self)
2122 Pandas DataFrame.
2123 """
-> 2124 df = self._frame_mgr_cls.to_pandas(self._partitions)
2125 if df.empty:
2126 if len(self.columns) != 0:
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/base/frame/partition_manager.py in to_pandas(cls, partitions)
529 A Pandas DataFrame
530 """
--> 531 retrieved_objects = [[obj.to_pandas() for obj in part] for part in partitions]
532 if all(
533 isinstance(part, pandas.Series) for row in retrieved_objects for part in row
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/base/frame/partition_manager.py in <listcomp>(.0)
529 A Pandas DataFrame
530 """
--> 531 retrieved_objects = [[obj.to_pandas() for obj in part] for part in partitions]
532 if all(
533 isinstance(part, pandas.Series) for row in retrieved_objects for part in row
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/base/frame/partition_manager.py in <listcomp>(.0)
529 A Pandas DataFrame
530 """
--> 531 retrieved_objects = [[obj.to_pandas() for obj in part] for part in partitions]
532 if all(
533 isinstance(part, pandas.Series) for row in retrieved_objects for part in row
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/dask/pandas_on_dask/frame/partition.py in to_pandas(self)
119 A Pandas DataFrame.
120 """
--> 121 dataframe = self.get()
122 assert type(dataframe) is pandas.DataFrame or type(dataframe) is pandas.Series
123
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/dask/pandas_on_dask/frame/partition.py in get(self)
61 if isinstance(self.future, pandas.DataFrame):
62 return self.future
---> 63 return self.future.result()
64
65 def apply(self, func, **kwargs):
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/distributed/client.py in result(self, timeout)
216 if self.status == "error":
217 typ, exc, tb = result
--> 218 raise exc.with_traceback(tb)
219 elif self.status == "cancelled":
220 raise result
/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py in loads()
ModuleNotFoundError: No module named 'modin'
Issue Analytics
- State:
- Created 2 years ago
- Comments:8 (6 by maintainers)
Top Results From Across the Web
Default pip installation of Dask gives "ImportError: No module ...
I installed Dask using pip like this:
Read more >Modin errors out on pytz.timezone() - Errors/Issues
Hello, I just started using Modin, it's truly delightful, so :+1:t2:. I just encountered an error that I verified it is specific to...
Read more >Resolve the ModuleNotFoundError on an Amazon SageMaker ...
I'm trying to run an Amazon SageMaker notebook instance with the Sparkmagic ... error: "ModuleNotFoundError: No module named my_module_name.
Read more >modin - PyPI
What is Modin? Modin is a drop-in replacement for pandas. While pandas is single-threaded, Modin lets you instantly speed up your workflows by...
Read more >big data manipulation · Issue #2360 · modin-project ... - GitHub
I tried to install it with pip install modin but I have pandas ... using pandas 1.1.4 and use Modin with Ray/Dask as...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Modin requires python version >= 3.7.1.
@fhenrywells we’ve recently added support for “legacy” Python 3.6 and pandas 1.1 in https://github.com/modin-project/modin/pull/4800 and https://github.com/modin-project/modin/pull/4301, this might help with installing Modin in some AWS cases. We don’t have a release with this yet, so you’d have to install from master.
Please let us know if that helps!