question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ModuleNotFoundError when running modin with dask in AWS environment

See original GitHub issue

System information

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux 4.14.252-131.483.amzn1.x86_64
  • Modin version (modin.__version__): 0.8.3
  • Python version: 3.6.10
  • Code we can use to reproduce:
import os
os.environ["MODIN_ENGINE"] = "dask" 
from dask.distributed import Client
client = Client('Dask-Scheduler.local-dask:8786') #path to my dask scheduler cluster
import modin.pandas as pd
frame_data = np.random.randint(0, 100, size=(2**10, 2**8))
df = pd.DataFrame(frame_data)

Describe the problem

I’ve created a conda environment with the following requirements, and based on this walkthrough:

conda create --name daskpy36 python=3.6.10 dask=2.14 distributed=2.14 s3fs=0.4.0 dask-glm=0.2.0 cytoolz=0.8.2 pandas=1.0.1 scikit-learn=0.23.2 matplotlib dask-ml=1.6.0 ipykernel -y

With this environment activated, I’ve pip installed modin[dask], and am able to load it properly, however when I try to use the library at all, I get a ModuleNotFoundError, as shown in the logs section.

For some reason it appears that my worker is using /opt/conda/lib/python for the pickle.loads call in the distributed package, instead of the specified one in my anaconda3/envs folder. This happens no matter what modin API call I make, including pd.read_csv to pd.dataframe

Source code / logs

ModuleNotFoundError                       Traceback (most recent call last)
~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/IPython/core/formatters.py in __call__(self, obj)
    343             method = get_real_method(obj, self.print_method)
    344             if method is not None:
--> 345                 return method()
    346             return None
    347         else:

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/pandas/dataframe.py in _repr_html_(self)
    196         # We use pandas _repr_html_ to get a string of the HTML representation
    197         # of the dataframe.
--> 198         result = self._build_repr_df(num_rows, num_cols)._repr_html_()
    199         if len(self.index) > num_rows or len(self.columns) > num_cols:
    200             # We split so that we insert our correct dataframe dimensions.

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/pandas/base.py in _build_repr_df(self, num_rows, num_cols)
    166         else:
    167             indexer = row_indexer
--> 168         return self.iloc[indexer]._query_compiler.to_pandas()
    169 
    170     def _update_inplace(self, new_query_compiler):

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/backends/pandas/query_compiler.py in to_pandas(self)
    233 
    234     def to_pandas(self):
--> 235         return self._modin_frame.to_pandas()
    236 
    237     @classmethod

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/base/frame/data.py in to_pandas(self)
   2122             Pandas DataFrame.
   2123         """
-> 2124         df = self._frame_mgr_cls.to_pandas(self._partitions)
   2125         if df.empty:
   2126             if len(self.columns) != 0:

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/base/frame/partition_manager.py in to_pandas(cls, partitions)
    529             A Pandas DataFrame
    530         """
--> 531         retrieved_objects = [[obj.to_pandas() for obj in part] for part in partitions]
    532         if all(
    533             isinstance(part, pandas.Series) for row in retrieved_objects for part in row

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/base/frame/partition_manager.py in <listcomp>(.0)
    529             A Pandas DataFrame
    530         """
--> 531         retrieved_objects = [[obj.to_pandas() for obj in part] for part in partitions]
    532         if all(
    533             isinstance(part, pandas.Series) for row in retrieved_objects for part in row

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/base/frame/partition_manager.py in <listcomp>(.0)
    529             A Pandas DataFrame
    530         """
--> 531         retrieved_objects = [[obj.to_pandas() for obj in part] for part in partitions]
    532         if all(
    533             isinstance(part, pandas.Series) for row in retrieved_objects for part in row

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/dask/pandas_on_dask/frame/partition.py in to_pandas(self)
    119             A Pandas DataFrame.
    120         """
--> 121         dataframe = self.get()
    122         assert type(dataframe) is pandas.DataFrame or type(dataframe) is pandas.Series
    123 

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/modin/engines/dask/pandas_on_dask/frame/partition.py in get(self)
     61         if isinstance(self.future, pandas.DataFrame):
     62             return self.future
---> 63         return self.future.result()
     64 
     65     def apply(self, func, **kwargs):

~/anaconda3/envs/daskpy36/lib/python3.6/site-packages/distributed/client.py in result(self, timeout)
    216         if self.status == "error":
    217             typ, exc, tb = result
--> 218             raise exc.with_traceback(tb)
    219         elif self.status == "cancelled":
    220             raise result

/opt/conda/lib/python3.6/site-packages/distributed/protocol/pickle.py in loads()

ModuleNotFoundError: No module named 'modin'

Issue Analytics

  • State:open
  • Created 2 years ago
  • Comments:8 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
YarShevcommented, Jan 11, 2022

Modin requires python version >= 3.7.1.

0reactions
vnlitvinovcommented, Aug 29, 2022

@fhenrywells we’ve recently added support for “legacy” Python 3.6 and pandas 1.1 in https://github.com/modin-project/modin/pull/4800 and https://github.com/modin-project/modin/pull/4301, this might help with installing Modin in some AWS cases. We don’t have a release with this yet, so you’d have to install from master.

Please let us know if that helps!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Default pip installation of Dask gives "ImportError: No module ...
I installed Dask using pip like this:
Read more >
Modin errors out on pytz.timezone() - Errors/Issues
Hello, I just started using Modin, it's truly delightful, so :+1:t2:. I just encountered an error that I verified it is specific to...
Read more >
Resolve the ModuleNotFoundError on an Amazon SageMaker ...
I'm trying to run an Amazon SageMaker notebook instance with the Sparkmagic ... error: "ModuleNotFoundError: No module named my_module_name.
Read more >
modin - PyPI
What is Modin? Modin is a drop-in replacement for pandas. While pandas is single-threaded, Modin lets you instantly speed up your workflows by...
Read more >
big data manipulation · Issue #2360 · modin-project ... - GitHub
I tried to install it with pip install modin but I have pandas ... using pandas 1.1.4 and use Modin with Ray/Dask as...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found