question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

inputs contain futures that were created by another client

See original GitHub issue

As per the Dask documentation, client.persist() returns the futures pointing to actively running task in the background. In my code, I am contacting multiple CSV files with persist and wait operation, then converting dask dataframe to dask array. But, before I convert dask dataframe into dask array, I get the following error:

[Inputs contain futures that were created by another client.
Task was destroyed but it is pending!
task: <Task pending coro=<Client._run() running at /home/user/anaconda3/envs/Dask_2021_04_0/lib/python3.7/site-packages/distributed/client.py:2429> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f052e1ba710>()]> cb=[IOLoop.add_future.<locals>.<lambda>() at /home/user/anaconda3/envs/Dask_2021_04_0/lib/python3.7/site-packages/tornado/ioloop.py:688]>
Exception ignored in: <generator object sync.<locals>.f at 0x7f071b213e50>
Traceback (most recent call last):
  File "/home/user/anaconda3/envs/Dask_2021_04_0/lib/python3.7/site-packages/distributed/utils.py", line 340, in f
    assert thread_state.asynchronous > 0
AttributeError: '_thread._local' object has no attribute 'asynchronous']

Please, provide any suggestions to rectify the same. The source code is as follows:

from dask.distributed import Client, LocalCluster,wait
import os

cluster=LocalCluster(n_workers=processes, threads_per_worker=1)
client=Client(cluster) 
for i, csv_file in enumerate(os.listdir("path to multiple csv files")):
    df = dd.read_csv(csv_file)
    meta=get_meta(df) #compute meta for each partition
    result=df.map_partitions(lambda part: handle_part(part),meta=meta)
    result1=client.persist(result)
    wait(result1)
    del df
    if i==0:
       result=result1
   else:
       results=result.append(result1)
       results=client.persist(results)
       wait(results)
       del result
       del result1
       result=results
result=client.persist(result)
wait(result)

X=result.to_dask_array(lengths=True)
X=client.persist(X) 
wait(X)
del result

Environment:

  • Dask version: 2021.04.0
  • Python version: 3.7
  • Operating System: ubuntu 20.04
  • Install method (conda, pip, source): conda

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
rahulsemwaliiitacommented, Apr 21, 2021

Thank you @fjetter for providing the suggestion. Now, everything is working fine. I think you were right that due to the assignment operation the old future reference was lost.

1reaction
fjettercommented, Apr 20, 2021

I do not really understand, yet, what’s going on but I suspect the for loop you are writing is somehow causing confusion with the futures/tasks. Every time you call result = new_result this looses the reference to the old future which might trigger some race condition. Have you tried reading all CSVs with one API call instead of looping over the collection?

https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv

ddf = dd.read_csv("/path/to/my/directory/*.csv")
ddf = ddf.map_partitions(handle_part)
ddf = client.persist(ddf)  # Consider not persisting this
arr = ddf.to_dask_array()
arr = client.persist(arr)
del ddf # If you persisted it previously, I recommend deleting it again at some point since otherwise you potentially hold twice the memory in RAM
Read more comments on GitHub >

github_iconTop Results From Across the Web

Unable to use published datasets in a different client #2336
ValueError: Inputs contain futures that were created by another client. Dask and distributed 2.1.0 .
Read more >
How to find the concurrent.future input arguments for a Dask ...
I'm creating a cluster and calling .submit() to submit a function to the scheduler. It returns a Futures object. I'm trying to figure...
Read more >
Futures - Dask documentation
You can pass futures as inputs to submit. Dask automatically handles dependency tracking; once all input futures have completed, they will be moved...
Read more >
Inputs contain futures that were created by another client.
Inputs contain futures that were created by another client. Package: dask. github stars 8734. Exception Class: ValueError ...
Read more >
distributed.client — Dask.distributed 2.11.0 documentation
A user manages future objects in the local Python process to determine what ... not self: msg = "Inputs contain futures that were...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found