Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LinearRegression Exception: "AttributeError: 'tuple' object has no attribute 'shape'"

See original GitHub issue

What happened: In order to use dask-ml models to train on a dask DataFrame, the DataFrame must be converted to a dask array.


#df is a dask DataFrame that looks something like this:
#
#                 flower.petal_length flower.petal_width flower.sepal_length
# npartitions=73                                                           
#                            float64            float64             float64
#                                ...                ...                 ...
# ...                            ...                ...                 ...
#                                ...                ...                 ...
#                                ...                ...                 ...
# Dask Name: astype, 1185 tasks

train_array = df.to_dask_array(lengths=True)
train_labels_array = df.to_dask_array(lengths=True)

# If I simply do a train_array.compute() or train_labels_array.compute() here & comment out .fit(), there are no problems

dask_model = LinearRegression()
dask_model.fit(train_array, train_labels_array)

When calling .fit on LinearRegression or LogisticRegression, I’m receiving the following output from the dask cluster:

distributed.worker - WARNING -  Compute Failed
Function:  subgraph_callable
args:      (('rechunk-merge-ff32808be3096a08028f7c8aa2a4bae3', 66, 0))
kwargs:    {}
 Exception: AttributeError("'tuple' object has no attribute 'shape'",)

Followed by the following exception being thrown:

   File "my_model.py", line 89, in model
     dask_model.fit(train_array, train_labels_array)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask_ml/linear_model/glm.py", line 187, in fit
     self._coef = algorithms._solvers[self.solver](X, y, **solver_kwargs)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask_glm/utils.py", line 17, in normalize_inputs
     mean, std = da.compute(X.mean(axis=0), X.std(axis=0))
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/base.py", line 567, in compute
     results = schedule(dsk, keys, **kwargs)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/client.py", line 2676, in get
     results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/client.py", line 1991, in gather
     asynchronous=asynchronous,
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/client.py", line 832, in sync
     self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/utils.py", line 340, in sync
     raise exc.with_traceback(tb)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/utils.py", line 324, in f
     result[0] = yield future
   File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/tornado/gen.py", line 762, in run
     value = future.result()
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/client.py", line 1850, in _gather
     raise exception.with_traceback(traceback)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/optimization.py", line 963, in __call__
     return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/core.py", line 151, in get
     result = _execute_task(task, cache)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/core.py", line 121, in _execute_task
     return func(*(_execute_task(a, cache) for a in args))
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/utils.py", line 30, in apply
     return func(*args, **kwargs)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/array/reductions.py", line 594, in mean_chunk
     n = numel(x, dtype=dtype, **kwargs)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/array/reductions.py", line 558, in numel
     shape = x.shape
 AttributeError: 'tuple' object has no attribute 'shape'

What you expected to happen:

Minimal Complete Verifiable Example: Unfortunately, I’m unable to come up with a minimal example that replicates the existing behavior. The DataFrame above is going through several steps before arriving at the point of being converted to an array (being published in a cluster, having the index reset, potentially undergoing several transformations via task_graphs and delayed(func) to .compute() calls). I’m including a basic example of what’s happening above to demonstrate, but when doing this in a dask LocalCluster separate from the other environment I don’t see the same issue.

Environment:

Dask version: 2.10.1
Dask-ml version: 1.7.0
dask-glm version: 0.2.0
sklearn version: 0.23.2
Python version: 3.6.8
Operating System: Centos7
Install method (conda, pip, source): pip

Issue Analytics

State:
Created 3 years ago
Comments:5 (2 by maintainers)

Top GitHub Comments

8reactions

HazemAbdelhafezcommented, Sep 3, 2021

Hi @Jreyno40,

I had a similar error with a very basic example: https://examples.dask.org/dataframes/01-data-access.html Specifically, at the two lines:

df = dd.read_csv('data/2000-*-*.csv')
df.head()

I end up with the a stack trace similar to yours, but the cause is a bit different “AttributeError: ‘tuple’ object has no attribute ‘head’”

Anyway, I solved it by specifying a scheduler option to the compute method and placing it in between the two lines above.

df = dd.read_csv('data/2000-*-*.csv')
df = df.compute(scheduler='threads')
df.head()

I am only starting to learn how to use Dask, but I guess this problem might be similar to yours, and the culprit could be the scheduler back end (threads vs processes) and how the data serialization works under the hood.

If you try the same example in the link I attached above, but only removing the client option altogether, or changing the processes=False to processes=True, then you don’t need the “df = df.compute(scheduler=‘threads’)” line at all and the examples works just fine.

Again, maybe I did some naive mistake as a new user of Dask, but I hope it helps you fix the error you encountered.

0reactions

KaeganCaseycommented, Nov 9, 2022

As an update:

I tried upgrading the versions of dask==2022.2.0 and dask-ml==2022.5.27 on the client and workers and re-running my code. I no longer receive the attribute error mentioned above but now I receive a different error on the fitting of the logistic regression: “ValueError: shapes (0,) and (739,) not aligned: 0 (dim 0) != 739 (dim 0)”. I have printed out the shapes of both arrays before fitting and their shapes do align and neither of them have zero dimensions.

It seems like there might be some sort of communication issue with the dask workers or something else that I do not understand. This code also works perfectly fine with less than 5 partitions of data.

I will try and open up a new issue for this error since it sounds like it is different but could still be related.

Here is the stack trace for this error if it is helpful:

  File "xy_querying_data.py", line 346, in run
    lr.fit(X_train_arr, y_train_arr)
  File "/opt/conda/lib/python3.7/site-packages/dask_ml/linear_model/glm.py", line 188, in fit
    self._coef = algorithms._solvers[self.solver](X, y, **solver_kwargs)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/utils.py", line 26, in normalize_inputs
    out = algo(Xn, y, *args, **kwargs).copy()
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/algorithms.py", line 265, in admm
    new_betas = np.array(da.compute(*new_betas))
  File "/opt/conda/lib/python3.7/site-packages/dask/base.py", line 573, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2994, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2152, in gather
    asynchronous=asynchronous,
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 310, in sync
    self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 376, in sync
    raise exc.with_traceback(tb)
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 349, in f
    result = yield future
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 769, in run
    value = future.result()
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2009, in _gather
    raise exception.with_traceback(traceback)
  File "/opt/conda/lib/python3.7/site-packages/dask/utils.py", line 39, in apply
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/algorithms.py", line 300, in local_update
    maxfun=250)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 198, in fmin_l_bfgs_b
    **opts)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 308, in _minimize_lbfgsb
    finite_diff_rel_step=finite_diff_rel_step)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 262, in _prepare_scalar_function
    finite_diff_rel_step, bounds, epsilon=epsilon)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 140, in __init__
    self._update_fun()
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 233, in _update_fun
    self._update_fun_impl()
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 137, in update_fun
    self.f = fun_wrapped(self.x)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 134, in fun_wrapped
    return fun(np.copy(x), *args)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/algorithms.py", line 234, in wrapped
    return func(beta, X, y) + (rho / 2) * np.dot(beta - z + u,
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/families.py", line 31, in pointwise_loss
    return Logistic.loglike(Xbeta, y)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/families.py", line 24, in loglike
    return (Xbeta + log1p(enXbeta)).sum() - dot(y, Xbeta)
  File "/opt/conda/lib/python3.7/site-packages/multipledispatch/dispatcher.py", line 278, in __call__
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/utils.py", line 126, in dot
    return np.dot(A, B)
  File "<__array_function__ internals>", line 6, in dot
ValueError: shapes (0,) and (739,) not aligned: 0 (dim 0) != 739 (dim 0)