Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

ResourceProfiler must be enclosed in `if name == 'main'` block on windows

See original GitHub issue

Under some circumstances the ResourceProfiler does not function (the other profilers and the progressbar do). The result attribute of the profile is empty. It does not occur under all circumstances, but occurs in both IPython shell and normal Python interpreter. The following code does not result in a recorded resource profile, and shows an error (but the program continues) when using the Python terminal. (dask 0.15.0, python 3.6.0)

import dask
from dask import delayed
from time import sleep
from dask.diagnostics import ResourceProfiler
from dask.diagnostics import visualize

with ResourceProfiler(dt=0.25) as rprof:
  dask.compute(delayed(sleep)(20))
visualize([ rprof, ])

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "c:\anaconda\lib\multiprocessing\spawn.py", line 105, in spawn_main
    exitcode = _main(fd)
  File "c:\anaconda\lib\multiprocessing\spawn.py", line 114, in _main
    prepare(preparation_data)
  File "c:\anaconda\lib\multiprocessing\spawn.py", line 225, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "c:\anaconda\lib\multiprocessing\spawn.py", line 277, in _fixup_main_from_path
    run_name="__mp_main__")
  File "c:\anaconda\lib\runpy.py", line 263, in run_path
    pkg_name=pkg_name, script_name=fname)
  File "c:\anaconda\lib\runpy.py", line 96, in _run_module_code
    mod_name, mod_spec, pkg_name, script_name)
  File "c:\anaconda\lib\runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "D:\Projecten\MovSum\BUG_LineAnalysis.py", line 11, in <module>
    with ResourceProfiler(dt=0.25) as rprof:
  File "c:\anaconda\lib\site-packages\dask\diagnostics\profile.py", line 140, in __init__
    self._tracker.start()
  File "c:\anaconda\lib\multiprocessing\process.py", line 105, in start
    self._popen = self._Popen(self)
  File "c:\anaconda\lib\multiprocessing\context.py", line 223, in _Popen
    return _default_context.get_context().Process._Popen(process_obj)
  File "c:\anaconda\lib\multiprocessing\context.py", line 322, in _Popen
    return Popen(process_obj)
  File "c:\anaconda\lib\multiprocessing\popen_spawn_win32.py", line 33, in __init__
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "c:\anaconda\lib\multiprocessing\spawn.py", line 143, in get_preparation_data
    _check_not_importing_main()
  File "c:\anaconda\lib\multiprocessing\spawn.py", line 136, in _check_not_importing_main
    is not going to be frozen to produce an executable.''')
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Exception ignored in: <bound method ResourceProfiler.close of <dask.diagnostics.profile.ResourceProfiler object at 0x0000000002A65390>>
Traceback (most recent call last):
  File "c:\anaconda\lib\site-packages\dask\diagnostics\profile.py", line 173, in close
    self._tracker.shutdown()
  File "c:\anaconda\lib\site-packages\dask\diagnostics\profile.py", line 210, in shutdown
    self.join()
  File "c:\anaconda\lib\multiprocessing\process.py", line 120, in join
    assert self._popen is not None, 'can only join a started process'
AssertionError: can only join a started process

Issue Analytics

State:
Created 6 years ago
Comments:11 (8 by maintainers)

Top GitHub Comments

1reaction

jcristcommented, Jan 18, 2018

This was all with the visualize call after the with block. I’m guessing that’s the intended use.

That is correct. visualize is for plotting completed diagnostics, not currently running ones. For completeness, your example above:

import dask.array as da
from dask.diagnostics import Profiler, ResourceProfiler, CacheProfiler, visualize
a = da.random.random(size=(1000, 100), chunks=(100, 100))
q, r = da.linalg.qr(a)
a2 = q.dot(r)

with Profiler() as prof, ResourceProfiler(dt=0.25) as rprof, CacheProfiler() as cprof:
    out = a2.compute()

# not in the with block
visualize([prof, rprof, cprof])

1reaction

mrocklincommented, Jun 23, 2017

I think it’s also reasonable to expect users to call code within if __name__ == '__main__' blocks. For example we already expect this if they use the multiprocessing scheduler.