question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Already executed Dask tasks get re-executed in Spyder

See original GitHub issue

Issue Report Checklist

  • Searched the issues page for similar reports
  • Read the relevant sections of the Spyder Troubleshooting Guide and followed its advice
  • Reproduced the issue after updating with conda update spyder (or pip, if not using Anaconda)
  • Could not reproduce inside jupyter qtconsole (if console-related)
  • Tried basic troubleshooting (if a bug/error)
    • Restarted Spyder
    • Reset preferences with spyder --reset
    • Reinstalled the latest version of Anaconda
    • Tried the other applicable steps from the Troubleshooting Guide
  • Completed the Problem Description, Steps to Reproduce and Version sections below

Problem Description

For some reason, Spyder triggers re-execution of Dask tasks that already finished. This behavior is not present when executing the code in Python or IPython. Restarting the Dask cluster does not remove these tasks from memory and they keep re-executing. The only way to remove tasks is to restart Spyder.

What steps reproduce the problem?

import pandas as pd
import dask.dataframe as dd
from dask.distributed import Client
import numpy as np

# start a local Dask cluster
client = Client()

# execute these blocks associated with df and df2 Dask DataFrames
df = dd.from_pandas(pd.DataFrame({'a':np.arange(10000000), 'b':np.arange(10000000)}), npartitions=100)
df = df.set_index('a')
df.to_parquet('test')

df2 = dd.from_pandas(pd.DataFrame({'a':np.arange(100000000), 'b':np.arange(100000000)}), npartitions=200)
df2 = df2.set_index('a')
df2.to_parquet('test2')

# open the Dask dashboard by using this URL in a browser: http://localhost:8787/status
# observe the dashboard as you execute the line below
# (you may need to have the browser and Spyder side by side to see the tasks appear in the dashboard)
df2.head()

# the above command should not trigger execution of tasks related to the df Dask DataFrame, but it does,
# as evident by the appearance (i.e., execution) of 100 from_pandas and len_chunk tasks associated with the df Dask DataFrame

What is the expected output? What do you see instead?

I expect only the relevant Dask code to be executed. When executing the above code in Python or IPython, tasks associated with df Dask DataFrame do not get executed.

Versions

  • Spyder version: 5.3.1
  • Python version: 3.8.13
  • Qt version: 5.15.4
  • PyQt version: 5.15.4
  • Operating System name/version: Windows Server 2016 64 bits

Dependencies

dask=2022.6.0

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
bsesarcommented, Jun 28, 2022

Hi @dalthviz. After I turned off Variable Explorer, the unwanted triggering of Dask tasks stopped. Thanks! 😃

0reactions
dalthvizcommented, Aug 2, 2022

Note: The call to the kernel triggering Dask tasks is caused by a call to get_var_properties when doing a call to refresh_namespacebrowser (which is called after any console execution)

Read more comments on GitHub >

github_iconTop Results From Across the Web

python - Dask executing already executed code on compute()
Dask appears to be rerunning already executed code when the code reaches the dask compute command. In the below code, an empty directory ......
Read more >
Debug - Dask documentation
By default, Dask already copies the exception and traceback wherever they occur and reraises that exception locally. If your task failed with a ......
Read more >
Diagnostics (distributed) - Dask documentation
Diagnostics (distributed)¶. The Dask distributed scheduler provides live feedback in two forms: An interactive dashboard containing many plots and tables ...
Read more >
Scheduling - Dask documentation
After Dask generates these task graphs, it needs to execute them on parallel hardware. This is the job of a task scheduler. Different...
Read more >
Futures - non-blocking distributed calculations - Dask Tutorial
Both can be used to support arbitrary task scheduling, but delayed is lazy ... it would take no time to execute the computation...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found