Already executed Dask tasks get re-executed in Spyder
See original GitHub issueIssue Report Checklist
- Searched the issues page for similar reports
- Read the relevant sections of the Spyder Troubleshooting Guide and followed its advice
- Reproduced the issue after updating with
conda update spyder
(orpip
, if not using Anaconda) - Could not reproduce inside
jupyter qtconsole
(if console-related) - Tried basic troubleshooting (if a bug/error)
- Restarted Spyder
- Reset preferences with
spyder --reset
- Reinstalled the latest version of Anaconda
- Tried the other applicable steps from the Troubleshooting Guide
- Completed the Problem Description, Steps to Reproduce and Version sections below
Problem Description
For some reason, Spyder triggers re-execution of Dask tasks that already finished. This behavior is not present when executing the code in Python or IPython. Restarting the Dask cluster does not remove these tasks from memory and they keep re-executing. The only way to remove tasks is to restart Spyder.
What steps reproduce the problem?
import pandas as pd
import dask.dataframe as dd
from dask.distributed import Client
import numpy as np
# start a local Dask cluster
client = Client()
# execute these blocks associated with df and df2 Dask DataFrames
df = dd.from_pandas(pd.DataFrame({'a':np.arange(10000000), 'b':np.arange(10000000)}), npartitions=100)
df = df.set_index('a')
df.to_parquet('test')
df2 = dd.from_pandas(pd.DataFrame({'a':np.arange(100000000), 'b':np.arange(100000000)}), npartitions=200)
df2 = df2.set_index('a')
df2.to_parquet('test2')
# open the Dask dashboard by using this URL in a browser: http://localhost:8787/status
# observe the dashboard as you execute the line below
# (you may need to have the browser and Spyder side by side to see the tasks appear in the dashboard)
df2.head()
# the above command should not trigger execution of tasks related to the df Dask DataFrame, but it does,
# as evident by the appearance (i.e., execution) of 100 from_pandas and len_chunk tasks associated with the df Dask DataFrame
What is the expected output? What do you see instead?
I expect only the relevant Dask code to be executed. When executing the above code in Python or IPython, tasks associated with df Dask DataFrame do not get executed.
Versions
- Spyder version: 5.3.1
- Python version: 3.8.13
- Qt version: 5.15.4
- PyQt version: 5.15.4
- Operating System name/version: Windows Server 2016 64 bits
Dependencies
dask=2022.6.0
Issue Analytics
- State:
- Created a year ago
- Comments:6 (4 by maintainers)
Top Results From Across the Web
python - Dask executing already executed code on compute()
Dask appears to be rerunning already executed code when the code reaches the dask compute command. In the below code, an empty directory ......
Read more >Debug - Dask documentation
By default, Dask already copies the exception and traceback wherever they occur and reraises that exception locally. If your task failed with a ......
Read more >Diagnostics (distributed) - Dask documentation
Diagnostics (distributed)¶. The Dask distributed scheduler provides live feedback in two forms: An interactive dashboard containing many plots and tables ...
Read more >Scheduling - Dask documentation
After Dask generates these task graphs, it needs to execute them on parallel hardware. This is the job of a task scheduler. Different...
Read more >Futures - non-blocking distributed calculations - Dask Tutorial
Both can be used to support arbitrary task scheduling, but delayed is lazy ... it would take no time to execute the computation...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Hi @dalthviz. After I turned off Variable Explorer, the unwanted triggering of Dask tasks stopped. Thanks! 😃
Note: The call to the kernel triggering Dask tasks is caused by a call to
get_var_properties
when doing a call torefresh_namespacebrowser
(which is called after any console execution)