question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

LinearRegression Exception: "AttributeError: 'tuple' object has no attribute 'shape'"

See original GitHub issue

What happened: In order to use dask-ml models to train on a dask DataFrame, the DataFrame must be converted to a dask array.


#df is a dask DataFrame that looks something like this:
#
#                 flower.petal_length flower.petal_width flower.sepal_length
# npartitions=73                                                           
#                            float64            float64             float64
#                                ...                ...                 ...
# ...                            ...                ...                 ...
#                                ...                ...                 ...
#                                ...                ...                 ...
# Dask Name: astype, 1185 tasks

train_array = df.to_dask_array(lengths=True)
train_labels_array = df.to_dask_array(lengths=True)

# If I simply do a train_array.compute() or train_labels_array.compute() here & comment out .fit(), there are no problems

dask_model = LinearRegression()
dask_model.fit(train_array, train_labels_array)

When calling .fit on LinearRegression or LogisticRegression, I’m receiving the following output from the dask cluster:

distributed.worker - WARNING -  Compute Failed
Function:  subgraph_callable
args:      (('rechunk-merge-ff32808be3096a08028f7c8aa2a4bae3', 66, 0))
kwargs:    {}
 Exception: AttributeError("'tuple' object has no attribute 'shape'",)

Followed by the following exception being thrown:

   File "my_model.py", line 89, in model
     dask_model.fit(train_array, train_labels_array)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask_ml/linear_model/glm.py", line 187, in fit
     self._coef = algorithms._solvers[self.solver](X, y, **solver_kwargs)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask_glm/utils.py", line 17, in normalize_inputs
     mean, std = da.compute(X.mean(axis=0), X.std(axis=0))
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/base.py", line 567, in compute
     results = schedule(dsk, keys, **kwargs)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/client.py", line 2676, in get
     results = self.gather(packed, asynchronous=asynchronous, direct=direct)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/client.py", line 1991, in gather
     asynchronous=asynchronous,
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/client.py", line 832, in sync
     self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/utils.py", line 340, in sync
     raise exc.with_traceback(tb)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/utils.py", line 324, in f
     result[0] = yield future
   File "/opt/rh/rh-python36/root/usr/lib64/python3.6/site-packages/tornado/gen.py", line 762, in run
     value = future.result()
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/distributed/client.py", line 1850, in _gather
     raise exception.with_traceback(traceback)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/optimization.py", line 963, in __call__
     return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/core.py", line 151, in get
     result = _execute_task(task, cache)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/core.py", line 121, in _execute_task
     return func(*(_execute_task(a, cache) for a in args))
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/utils.py", line 30, in apply
     return func(*args, **kwargs)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/array/reductions.py", line 594, in mean_chunk
     n = numel(x, dtype=dtype, **kwargs)
   File "/opt/rh/rh-python36/root/usr/lib/python3.6/site-packages/dask/array/reductions.py", line 558, in numel
     shape = x.shape
 AttributeError: 'tuple' object has no attribute 'shape'

What you expected to happen:

Minimal Complete Verifiable Example: Unfortunately, I’m unable to come up with a minimal example that replicates the existing behavior. The DataFrame above is going through several steps before arriving at the point of being converted to an array (being published in a cluster, having the index reset, potentially undergoing several transformations via task_graphs and delayed(func) to .compute() calls). I’m including a basic example of what’s happening above to demonstrate, but when doing this in a dask LocalCluster separate from the other environment I don’t see the same issue.

Environment:

  • Dask version: 2.10.1
  • Dask-ml version: 1.7.0
  • dask-glm version: 0.2.0
  • sklearn version: 0.23.2
  • Python version: 3.6.8
  • Operating System: Centos7
  • Install method (conda, pip, source): pip

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

8reactions
HazemAbdelhafezcommented, Sep 3, 2021

Hi @Jreyno40,

I had a similar error with a very basic example: https://examples.dask.org/dataframes/01-data-access.html Specifically, at the two lines:

df = dd.read_csv('data/2000-*-*.csv')
df.head()

I end up with the a stack trace similar to yours, but the cause is a bit different “AttributeError: ‘tuple’ object has no attribute ‘head’”

Anyway, I solved it by specifying a scheduler option to the compute method and placing it in between the two lines above.

df = dd.read_csv('data/2000-*-*.csv')
df = df.compute(scheduler='threads')
df.head()

I am only starting to learn how to use Dask, but I guess this problem might be similar to yours, and the culprit could be the scheduler back end (threads vs processes) and how the data serialization works under the hood.

If you try the same example in the link I attached above, but only removing the client option altogether, or changing the processes=False to processes=True, then you don’t need the “df = df.compute(scheduler=‘threads’)” line at all and the examples works just fine.

Again, maybe I did some naive mistake as a new user of Dask, but I hope it helps you fix the error you encountered.

0reactions
KaeganCaseycommented, Nov 9, 2022

As an update:

I tried upgrading the versions of dask==2022.2.0 and dask-ml==2022.5.27 on the client and workers and re-running my code. I no longer receive the attribute error mentioned above but now I receive a different error on the fitting of the logistic regression: “ValueError: shapes (0,) and (739,) not aligned: 0 (dim 0) != 739 (dim 0)”. I have printed out the shapes of both arrays before fitting and their shapes do align and neither of them have zero dimensions.

It seems like there might be some sort of communication issue with the dask workers or something else that I do not understand. This code also works perfectly fine with less than 5 partitions of data.

I will try and open up a new issue for this error since it sounds like it is different but could still be related.

Here is the stack trace for this error if it is helpful:

  File "xy_querying_data.py", line 346, in run
    lr.fit(X_train_arr, y_train_arr)
  File "/opt/conda/lib/python3.7/site-packages/dask_ml/linear_model/glm.py", line 188, in fit
    self._coef = algorithms._solvers[self.solver](X, y, **solver_kwargs)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/utils.py", line 26, in normalize_inputs
    out = algo(Xn, y, *args, **kwargs).copy()
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/algorithms.py", line 265, in admm
    new_betas = np.array(da.compute(*new_betas))
  File "/opt/conda/lib/python3.7/site-packages/dask/base.py", line 573, in compute
    results = schedule(dsk, keys, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2994, in get
    results = self.gather(packed, asynchronous=asynchronous, direct=direct)
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2152, in gather
    asynchronous=asynchronous,
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 310, in sync
    self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 376, in sync
    raise exc.with_traceback(tb)
  File "/opt/conda/lib/python3.7/site-packages/distributed/utils.py", line 349, in f
    result = yield future
  File "/opt/conda/lib/python3.7/site-packages/tornado/gen.py", line 769, in run
    value = future.result()
  File "/opt/conda/lib/python3.7/site-packages/distributed/client.py", line 2009, in _gather
    raise exception.with_traceback(traceback)
  File "/opt/conda/lib/python3.7/site-packages/dask/utils.py", line 39, in apply
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/algorithms.py", line 300, in local_update
    maxfun=250)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 198, in fmin_l_bfgs_b
    **opts)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/lbfgsb.py", line 308, in _minimize_lbfgsb
    finite_diff_rel_step=finite_diff_rel_step)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/optimize.py", line 262, in _prepare_scalar_function
    finite_diff_rel_step, bounds, epsilon=epsilon)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 140, in __init__
    self._update_fun()
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 233, in _update_fun
    self._update_fun_impl()
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 137, in update_fun
    self.f = fun_wrapped(self.x)
  File "/opt/conda/lib/python3.7/site-packages/scipy/optimize/_differentiable_functions.py", line 134, in fun_wrapped
    return fun(np.copy(x), *args)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/algorithms.py", line 234, in wrapped
    return func(beta, X, y) + (rho / 2) * np.dot(beta - z + u,
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/families.py", line 31, in pointwise_loss
    return Logistic.loglike(Xbeta, y)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/families.py", line 24, in loglike
    return (Xbeta + log1p(enXbeta)).sum() - dot(y, Xbeta)
  File "/opt/conda/lib/python3.7/site-packages/multipledispatch/dispatcher.py", line 278, in __call__
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/dask_glm/utils.py", line 126, in dot
    return np.dot(A, B)
  File "<__array_function__ internals>", line 6, in dot
ValueError: shapes (0,) and (739,) not aligned: 0 (dim 0) != 739 (dim 0)
Read more comments on GitHub >

github_iconTop Results From Across the Web

Python AttributeError: 'tuple' object has no attribute
AttributeError : 'tuple' object has no attribute. Learn Data Science with. This error occurs when attempting to access the values of a tuple...
Read more >
AttributeError: 'tuple' object has no attribute 'shape' using tuner ...
I was trying to solve this problem ValueError: `logits` and `labels` must have the same shape, received ((None, 1) vs (None, ...
Read more >
'tuple' object has no attribute 'shape'" in Keras Model
... I face with an error displaying "AttributeError: 'tuple' object has no attribute 'shape'". How can I get rid of that error ?...
Read more >
AttributeError: 'tuple' object has no attribute 'items'
Coding example for the question AttributeError: 'tuple' object has no attribute 'items'
Read more >
AttributeError: 'list' object has no attribute 'shape' | bobbyhadz
The Python "AttributeError: 'list' object has no attribute 'shape'" occurs when we try to access the shape attribute on a list. To solve...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found