Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Python Interactive Window doesn't parallelize with dask properly

See original GitHub issue

Environment data

VS Code version: Version: 1.62.0-insider
Extension version (available under the Extensions sidebar): v2021.11.1313923388-dev
OS and version: macOS 11.6 (Apple Silicon M1)
Python version (& distribution if applicable, e.g. Anaconda): 3.9.7 (conda-forge)
Type of virtual environment used (N/A | venv | virtualenv | conda | …): conda
Relevant/affected Python packages and their versions: dask 2021.9.1
Relevant/affected Python-related VS Code extensions and their versions: Jupyter v2021.10.1001362801
Value of the python.languageServer setting: Default

Actual behaviour

I have a .py file and am using # %% to run cells in the Python Interactive Window (which I love!). I have a function (fiber_analysis) that takes 1 min to run on an image—essentially all the last step—using 1 python process on my computer and 100% CPU. When I spin up the Interactive Window kernel I see 4 python3.9 processes in my Activity Monitor. I have 4 performance cores 4 efficiency cores and this computation should be able to run in parallel since the computations are independent per image. So I thought dask would help. So i made a dask array of my images and chunked by image. (Note: I pared down my data set to just 6 images so the tests would be quicker) I’ve set up to iterate over the slices and run the function as delayed:

output_s1 = []

for sample in all_samples_g:
    dvals = delayed(fiber_analysis)(sample)
    output_s1.append(dvals)

I can then compute this:

import dask
out_dvals = dask.compute(*output_s1)

And this uses one of the existing python3.9 processes and 115-120% CPU—essentially same as just running on one image. This takes 7.5 min. So that’s not really progress, but I understand that this should be using the dask threaded model by default, so only non-python code will be parallelized. So next I try:

import dask
out_dvals = dask.compute(*output_s1, scheduler='processes')

This launches 1 additional python3.9 process and an additional processes python and now that python process uses ~125% CPU. Oddly, this computation takes >10 min, so much longer than just running the function on the images one by one. I killed it. Maybe I need to add workers? So lets try that. 4 workers, for my 4 performance cores:

import dask
out_dvals = dask.compute(*output_s1, num_workers=4)

Same as without the argument. 1 python3.9 process (out of 4) doing 115% CPU. Run time: 7 min. So no great shakes. What about with processes and 4 workers? Same as before: 1 extra python3.9 and 1 python process running at 125% CPU. And again >10 min so I kill it.

It appears that no speed benefit can be gained by using the regular dask scheduler and Interactive Python Windows… Maybe related to: https://github.com/microsoft/vscode-jupyter/issues/2962

Anyhow, by accident I reran:

out_dvals = dask.compute(*output_s1, scheduler='processes', num_workers=4)

with a stack of 15 images instead of 6. Now 3 python processes spawn, each doing 115% CPU, & 1 essentially idle extra python 3.9 process. So promising! …but after 20 min I killed it, because that’s longer than just running each image one by one.

Expected behaviour

When rerunning the same code as a Jupyter notebook .ipynb, with the 15 image stack, processed using

out_dvals = dask.compute(*output_s1, scheduler='processes', num_workers=4)

5 python3.9 processes were spawned total by the command. 4 python3.9 processes go to 100% CPU, with all 4 performance cores are totally slammed, 100%. 5 remaining python3.9 processes essentially idle. 4 mins in, only 2 processes at 100%, rest idle. 5 min in, 1 process at 100%, rest idle. And finished in 5 min 15 s! 🎉

Steps to reproduce:

Not sure how I can make something that is easily shared, any advice?

Logs

N/A

Issue Analytics

State:
Created 2 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

rchiodocommented, Oct 21, 2021

This might help: https://code.visualstudio.com/docs/python/debugging

Essentially you have to attach a debugger to the python kernel and then debug the execution on your dask code.

At some point the execution will go through this function: https://github.com/ipython/ipykernel/blob/fdda069bba36cafcc25df4d2353b26fbdb9e4d15/ipykernel/ipkernel.py#L294

A breakpoint is something that tells the debugger (after you attach) to stop execution at a line on a piece of code.

1reaction

rchiodocommented, Oct 20, 2021

Ah sorry I misunderstood. It works fine in a notebook but not in the IW (but both running in VS code). Then the disableZMQSupport flag will have no effect.

The difference between running an IW kernel and jupyter kernel is just the things we set in it. Like __file__ is set in the IW but not the notebook. Not sure how that would affect dask though.

This likely requires debugging the kernel itself to see why it isn’t parallelizing.

Top Results From Across the Web

How to parallelize Python code with Dask Delayed - Coiled.io

Dask Delayed allows us to parallelize custom Python code and algorithms. It can be very powerful in accelerating existing workflows with ...

How do I parallelize a simple Python loop? - Stack Overflow

I'm using currently Linux but the code should run on Windows and Mac as-well. What's the easiest way to parallelize this code? python...

dask.distributed - Parallel Processing in Python - CoderzColumn

The dask.distributed module is wrapper around python concurrent.futures module ... Start Scheduler by executing below command in the shell.

dask.distributed client hangs in VSCode · Issue #5574 - GitHub

Client in vscode python interactive window I have to restart the kernel a few times and copy+paste into the input terminal until the...

Embarrassingly parallel Workloads - Dask Examples

There are many ways to parallelize this function in Python with libraries like multiprocessing , concurrent.futures , joblib or others. These are good...