question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multicore inference not working

See original GitHub issue

First of all, thank you very much for working on this great project!

My issue is that even for very simple models, running chains on multiple cores doesn’t work.

Content of test_bambi.py:

import bambi as bmb
import pandas as pd
import numpy as np

data = pd.DataFrame({
    "y": np.random.normal(size=50),
    "x1": np.random.normal(size=50),
    "x2": np.random.normal(size=50)
})

model = bmb.Model("y ~ x1 + x2", data)
fitted = model.fit(cores=2)

Output:

> python3 test_bambi.py               
/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [y_sigma, x2, x1, Intercept]
/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [y_sigma, x2, x1, Intercept]
Traceback (most recent call last):
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/forkserver.py", line 274, in main
    code = _serve_one(child_r, fds,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/forkserver.py", line 313, in _serve_one
    code = spawn._main(child_r, parent_sentinel)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 125, in _main
    prepare(preparation_data)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 236, in prepare
    _fixup_main_from_path(data['init_main_from_path'])
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 287, in _fixup_main_from_path
    main_content = runpy.run_path(main_path,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 268, in run_path
    return _run_module_code(code, init_globals, run_name,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 97, in _run_module_code
    _run_code(code, mod_globals, init_globals,
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/Users/harasim/Documents/repos/python/test-bambi/test_bambi.py", line 12, in <module>
    fitted = model.fit(cores=2)
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/models.py", line 278, in fit
    return self.backend.run(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/backend/pymc.py", line 90, in run
    result = self._run_mcmc(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/backend/pymc.py", line 217, in _run_mcmc
    idata = pm.sample(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/sampling.py", line 559, in sample
    trace = _mp_sample(**sample_args, **parallel_args)
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/sampling.py", line 1461, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 431, in __init__
    self._samplers = [
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 432, in <listcomp>
    ProcessAdapter(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 292, in __init__
    self._process.start()
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_forkserver.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.
Traceback (most recent call last):
  File "/Users/harasim/Documents/repos/python/test-bambi/test_bambi.py", line 12, in <module>
    fitted = model.fit(cores=2)
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/models.py", line 278, in fit
    return self.backend.run(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/backend/pymc.py", line 90, in run
    result = self._run_mcmc(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/bambi/backend/pymc.py", line 217, in _run_mcmc
    idata = pm.sample(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/sampling.py", line 559, in sample
    trace = _mp_sample(**sample_args, **parallel_args)
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/sampling.py", line 1461, in _mp_sample
    sampler = ps.ParallelSampler(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 431, in __init__
    self._samplers = [
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 432, in <listcomp>
    ProcessAdapter(
  File "/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/pymc3/parallel_sampling.py", line 292, in __init__
    self._process.start()
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/context.py", line 291, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_forkserver.py", line 35, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/python@3.9/3.9.7_1/Frameworks/Python.framework/Versions/3.9/lib/python3.9/multiprocessing/popen_forkserver.py", line 58, in _launch
    f.write(buf.getbuffer())
BrokenPipeError: [Errno 32] Broken pipe

However, if I set model.fit(cores=1) it runs the chains sequentially and succeeds.

> python3 test_bambi.py
/Users/harasim/Documents/repos/python/test-bambi/env/lib/python3.9/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Sequential sampling (2 chains in 1 job)
NUTS: [y_sigma, x2, x1, Intercept]
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 3 seconds.0% [2000/2000 00:01<00:00 Sampling chain 1, 0 divergences]

I used a fresh installation in a virtual env with python 3.9.7 on macOS 11.6 on a machine with a 2.6 GHz 6-Core Intel Core i7.

> python3 -m pip install bambi        
Collecting bambi
  Using cached bambi-0.7.1-py3-none-any.whl (72 kB)
Collecting scipy>=1.7.0
  Using cached scipy-1.7.3-cp39-cp39-macosx_10_9_x86_64.whl (33.2 MB)
Collecting formulae==0.2.0
  Using cached formulae-0.2.0-py3-none-any.whl (43 kB)
Collecting pymc3>=3.9.0
  Using cached pymc3-3.11.4-py3-none-any.whl (869 kB)
Collecting pandas>=1.0.0
  Using cached pandas-1.4.0-cp39-cp39-macosx_10_9_x86_64.whl (11.5 MB)
Collecting statsmodels>=0.9
  Using cached statsmodels-0.13.1-cp39-cp39-macosx_10_15_x86_64.whl (9.6 MB)
Collecting arviz>=0.11.2
  Using cached arviz-0.11.4-py3-none-any.whl (1.6 MB)
Collecting numpy<1.22.0,>=1.16.1
  Using cached numpy-1.21.5-cp39-cp39-macosx_10_9_x86_64.whl (17.0 MB)
Collecting xarray>=0.16.1
  Using cached xarray-0.20.2-py3-none-any.whl (845 kB)
Collecting packaging
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Collecting typing-extensions<4,>=3.7.4.3
  Using cached typing_extensions-3.10.0.2-py3-none-any.whl (26 kB)
Requirement already satisfied: setuptools>=38.4 in ./env/lib/python3.9/site-packages (from arviz>=0.11.2->bambi) (57.4.0)
Collecting netcdf4
  Using cached netCDF4-1.5.8-cp39-cp39-macosx_10_9_x86_64.whl (4.2 MB)
Collecting matplotlib>=3.0
  Using cached matplotlib-3.5.1-cp39-cp39-macosx_10_9_x86_64.whl (7.3 MB)
Collecting python-dateutil>=2.8.1
  Using cached python_dateutil-2.8.2-py2.py3-none-any.whl (247 kB)
Collecting pytz>=2020.1
  Using cached pytz-2021.3-py2.py3-none-any.whl (503 kB)
Collecting theano-pymc==1.1.2
  Using cached Theano_PyMC-1.1.2-py3-none-any.whl
Collecting semver>=2.13.0
  Using cached semver-2.13.0-py2.py3-none-any.whl (12 kB)
Collecting fastprogress>=0.2.0
  Using cached fastprogress-1.0.0-py3-none-any.whl (12 kB)
Collecting cachetools>=4.2.1
  Using cached cachetools-5.0.0-py3-none-any.whl (9.1 kB)
Collecting patsy>=0.5.1
  Using cached patsy-0.5.2-py2.py3-none-any.whl (233 kB)
Collecting dill
  Using cached dill-0.3.4-py2.py3-none-any.whl (86 kB)
Collecting filelock
  Using cached filelock-3.4.2-py3-none-any.whl (9.9 kB)
Collecting cycler>=0.10
  Using cached cycler-0.11.0-py3-none-any.whl (6.4 kB)
Collecting fonttools>=4.22.0
  Using cached fonttools-4.29.0-py3-none-any.whl (895 kB)
Collecting pillow>=6.2.0
  Using cached Pillow-9.0.0-cp39-cp39-macosx_10_10_x86_64.whl (3.0 MB)
Collecting pyparsing>=2.2.1
  Using cached pyparsing-3.0.7-py3-none-any.whl (98 kB)
Collecting kiwisolver>=1.0.1
  Using cached kiwisolver-1.3.2-cp39-cp39-macosx_10_9_x86_64.whl (61 kB)
Collecting six
  Using cached six-1.16.0-py2.py3-none-any.whl (11 kB)
Collecting cftime
  Using cached cftime-1.5.2-cp39-cp39-macosx_10_9_x86_64.whl (222 kB)
Installing collected packages: six, pytz, python-dateutil, pyparsing, numpy, pillow, pandas, packaging, kiwisolver, fonttools, cycler, cftime, xarray, typing-extensions, scipy, netcdf4, matplotlib, filelock, theano-pymc, semver, patsy, fastprogress, dill, cachetools, arviz, statsmodels, pymc3, formulae, bambi
Successfully installed arviz-0.11.4 bambi-0.7.1 cachetools-5.0.0 cftime-1.5.2 cycler-0.11.0 dill-0.3.4 fastprogress-1.0.0 filelock-3.4.2 fonttools-4.29.0 formulae-0.2.0 kiwisolver-1.3.2 matplotlib-3.5.1 netcdf4-1.5.8 numpy-1.21.5 packaging-21.3 pandas-1.4.0 patsy-0.5.2 pillow-9.0.0 pymc3-3.11.4 pyparsing-3.0.7 python-dateutil-2.8.2 pytz-2021.3 scipy-1.7.3 semver-2.13.0 six-1.16.0 statsmodels-0.13.1 theano-pymc-1.1.2 typing-extensions-3.10.0.2 xarray-0.20.2

Issue Analytics

  • State:closed
  • Created 2 years ago
  • Comments:7

github_iconTop GitHub Comments

1reaction
hadjipanteliscommented, Jan 31, 2022

Just for reference I use both a Mac and an Ubuntu system and on the Mac @dharasim code worked fine. The error is not reproducible on my side either.

Python 3.9.10 | packaged by conda-forge | (main, Jan 28 2022, 19:24:57) 
[Clang 11.1.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import bambi as bmb
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
/Users/phadjipa/opt/anaconda3/envs/pymc3_env/lib/python3.9/site-packages/statsmodels/compat/pandas.py:65: FutureWarning: pandas.Int64Index is deprecated and will be removed from pandas in a future version. Use pandas.Index with the appropriate dtype instead.
  from pandas import Int64Index as NumericIndex
>>> import pandas as pd
>>> import numpy as np
>>> 
>>> data = pd.DataFrame({
...     "y": np.random.normal(size=50),
...     "x1": np.random.normal(size=50),
...     "x2": np.random.normal(size=50)
... })
>>> 
>>> model = bmb.Model("y ~ x1 + x2", data)
>>> fitted = model.fit(cores=2)
Auto-assigning NUTS sampler...
Initializing NUTS using jitter+adapt_diag...
Multiprocess sampling (2 chains in 2 jobs)
NUTS: [y_sigma, x2, x1, Intercept]
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
WARNING (theano.tensor.blas): Using NumPy C-API based implementation for BLAS functions.
Sampling 2 chains for 1_000 tune and 1_000 draw iterations (2_000 + 2_000 draws total) took 12 seconds. [4000/4000 00:03<00:00 Sampling 2 chains, 0 divergences]
>>> 

I am on macOS 11.5.2. It seems to me that this is Python-related rather than bambi-related. I know this is a long shot but maybe you want to try using a clean conda enviorment where you install pymc3 according to https://github.com/pymc-devs/pymc/wiki/Installation-Guide-(MacOS)?

0reactions
tomicaprettocommented, Feb 1, 2022

@dharasim thanks for reporting a possible solution! I was completely unaware of this problem. Glad you found a solution!

Read more comments on GitHub >

github_iconTop Results From Across the Web

hello,how to improve [Performance] in batch inference with ...
Describe the issue how to improve [Performance] in batch inference with multicore cpu( logical core, 8 threads/1 core)? To reproduce i just use...
Read more >
Scaling up BERT-like model Inference on modern CPU - Part 1
In this blog post we will not describe in detail the Transformer ... now be in good shape to set up parallel inference...
Read more >
How to fix pytorch multi processing issue on cpu?
I set it to 10 which was 2-much as I have 8 cores. Setting it to 6 work fine. ... import torch.multiprocessing as...
Read more >
Interference Poses Biggest Challenge to Using Multicore ...
Transitioning to multicore processors (MCPs) creates all kinds of processing and interference problems that simply didn't exist with with ...
Read more >
Model bundle performance / multi-core inference - glow
Hello! I am trying to use Glow in order to increase inference speed for a custom pytorch model based on mobilenet architecture.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found