inputs contain futures that were created by another client
See original GitHub issueAs per the Dask documentation, client.persist() returns the futures pointing to actively running task in the background. In my code, I am contacting multiple CSV files with persist and wait operation, then converting dask dataframe to dask array. But, before I convert dask dataframe into dask array, I get the following error:
[Inputs contain futures that were created by another client.
Task was destroyed but it is pending!
task: <Task pending coro=<Client._run() running at /home/user/anaconda3/envs/Dask_2021_04_0/lib/python3.7/site-packages/distributed/client.py:2429> wait_for=<Future pending cb=[<TaskWakeupMethWrapper object at 0x7f052e1ba710>()]> cb=[IOLoop.add_future.<locals>.<lambda>() at /home/user/anaconda3/envs/Dask_2021_04_0/lib/python3.7/site-packages/tornado/ioloop.py:688]>
Exception ignored in: <generator object sync.<locals>.f at 0x7f071b213e50>
Traceback (most recent call last):
File "/home/user/anaconda3/envs/Dask_2021_04_0/lib/python3.7/site-packages/distributed/utils.py", line 340, in f
assert thread_state.asynchronous > 0
AttributeError: '_thread._local' object has no attribute 'asynchronous']
Please, provide any suggestions to rectify the same. The source code is as follows:
from dask.distributed import Client, LocalCluster,wait
import os
cluster=LocalCluster(n_workers=processes, threads_per_worker=1)
client=Client(cluster)
for i, csv_file in enumerate(os.listdir("path to multiple csv files")):
df = dd.read_csv(csv_file)
meta=get_meta(df) #compute meta for each partition
result=df.map_partitions(lambda part: handle_part(part),meta=meta)
result1=client.persist(result)
wait(result1)
del df
if i==0:
result=result1
else:
results=result.append(result1)
results=client.persist(results)
wait(results)
del result
del result1
result=results
result=client.persist(result)
wait(result)
X=result.to_dask_array(lengths=True)
X=client.persist(X)
wait(X)
del result
Environment:
- Dask version: 2021.04.0
- Python version: 3.7
- Operating System: ubuntu 20.04
- Install method (conda, pip, source): conda
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (3 by maintainers)
Top Results From Across the Web
Unable to use published datasets in a different client #2336
ValueError: Inputs contain futures that were created by another client. Dask and distributed 2.1.0 .
Read more >How to find the concurrent.future input arguments for a Dask ...
I'm creating a cluster and calling .submit() to submit a function to the scheduler. It returns a Futures object. I'm trying to figure...
Read more >Futures - Dask documentation
You can pass futures as inputs to submit. Dask automatically handles dependency tracking; once all input futures have completed, they will be moved...
Read more >Inputs contain futures that were created by another client.
Inputs contain futures that were created by another client. Package: dask. github stars 8734. Exception Class: ValueError ...
Read more >distributed.client — Dask.distributed 2.11.0 documentation
A user manages future objects in the local Python process to determine what ... not self: msg = "Inputs contain futures that were...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Thank you @fjetter for providing the suggestion. Now, everything is working fine. I think you were right that due to the assignment operation the old future reference was lost.
I do not really understand, yet, what’s going on but I suspect the for loop you are writing is somehow causing confusion with the futures/tasks. Every time you call
result = new_result
this looses the reference to the old future which might trigger some race condition. Have you tried reading all CSVs with one API call instead of looping over the collection?https://docs.dask.org/en/latest/dataframe-api.html#dask.dataframe.read_csv