question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

TerminatedWorkerError

See original GitHub issue

following up on #456

I am running into a TerminatedWorkerError.

Minimal example:

import pandas as pd
import pandas_profiling

plannet_data = pd.read_csv('https://github.com/mwaskom/seaborn-data/blob/master/raw/planets.csv')
display(plannet_data) # ok
plannet_data.profile_report()

Returns the error:

TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the function or by an excessive memory usage causing the Operating System to kill the worker. The exit codes of the workers are {EXIT(1)}

Environment:

Linux Mint 19.3 Tricia, base: Ubuntu 18.04 bionic
RAM 16 GB

python 3.6.9

astropy==4.0.1.post1
async-generator==1.10
attrs==19.3.0
autopep8==1.5.2
backcall==0.1.0
bleach==3.1.5
certifi==2020.4.5.1
chardet==3.0.4
confuse==1.1.0
cycler==0.10.0
decorator==4.4.2
defusedxml==0.6.0
descartes==1.1.0
entrypoints==0.3
htmlmin==0.1.12
idna==2.9
ImageHash==4.1.0
importlib-metadata==1.6.0
ipykernel==5.2.1
ipython==7.14.0
ipython-genutils==0.2.0
ipywidgets==7.5.1
jedi==0.17.0
Jinja2==2.11.2
joblib==0.15.1
json5==0.9.4
jsonschema==3.2.0
jupyter-client==6.1.3
jupyter-core==4.6.3
jupyter-server==0.1.1
jupyterlab==2.1.2
jupyterlab-pygments==0.1.1
jupyterlab-server==1.1.4
kiwisolver==1.2.0
llvmlite==0.32.1
MarkupSafe==1.1.1
matplotlib==3.2.1
missingno==0.4.2
mistune==0.8.4
mizani==0.6.0
nbconvert==5.6.1
nbformat==5.0.6
networkx==2.4
notebook==6.0.3
numba==0.49.1
numpy==1.18.4
packaging==20.3
palettable==3.3.0
pandas==1.0.3
pandas-profiling==2.8.0
pandocfilters==1.4.2
parso==0.7.0
patsy==0.5.1
pexpect==4.8.0
phik==0.9.12
pickleshare==0.7.5
Pillow==7.1.2
plotnine==0.6.0
prometheus-client==0.7.1
prompt-toolkit==3.0.5
ptyprocess==0.6.0
pycodestyle==2.6.0
Pygments==2.6.1
pyparsing==2.4.7
pyrsistent==0.16.0
python-dateutil==2.8.1
pytz==2020.1
PyWavelets==1.1.1
PyYAML==5.3.1
pyzmq==19.0.1
requests==2.23.0
scipy==1.4.1
seaborn==0.10.1
Send2Trash==1.5.0
six==1.14.0
statsmodels==0.11.1
tangled-up-in-unicode==0.0.6
terminado==0.8.3
testpath==0.4.4
tornado==6.0.4
tqdm==4.46.0
traitlets==4.3.3
urllib3==1.25.9
visions==0.4.4
voila==0.1.21
wcwidth==0.1.9
webencodings==0.5.1
widgetsnbextension==3.5.1
zipp==3.1.0

thanks David

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:6 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
vishalsraocommented, Nov 15, 2021

The issue occurs while parallelizing phik computation.

As a workaround, parallelization can be disabled by overwriting phik.phik_matrix method with a similar method where the default value of njobs is 1 instead of -1.

import phik
from typing import Tuple, Union, Optional
from phik.binning import auto_bin_data
from phik.phik import phik_from_rebinned_df
import numpy as np

# Same as phik.phik_matrix except for the default value of njobs
def phik_matrix_nJobsDefVal(
    df: pd.DataFrame,
    interval_cols: Optional[list] = None,
    bins: Union[int, list, np.ndarray, dict] = 10,
    quantile: bool = False,
    noise_correction: bool = True,
    dropna: bool = True,
    drop_underflow: bool = True,
    drop_overflow: bool = True,
    verbose: bool = True,
    njobs: int = 1,
) -> pd.DataFrame:
    """
    Correlation matrix of bivariate gaussian derived from chi2-value
    Chi2-value gets converted into correlation coefficient of bivariate gauss
    with correlation value rho, assuming giving binning and number of records.
    Correlation coefficient value is between 0 and 1.
    Bivariate gaussian's range is set to [-5,5] by construction.
    :param pd.DataFrame data_binned: input data
    :param list interval_cols: column names of columns with interval variables.
    :param bins: number of bins, or a list of bin edges (same for all columns), or a dictionary where per column the bins are specified. (default=10)\
    E.g.: bins = {'mileage':5, 'driver_age':[18,25,35,45,55,65,125]}
    :param quantile: when bins is an integer, uniform bins (False) or bins based on quantiles (True)
    :param bool noise_correction: apply noise correction in phik calculation
    :param bool dropna: remove NaN values with True
    :param bool drop_underflow: do not take into account records in underflow bin when True (relevant when binning\
    a numeric variable)
    :param bool drop_overflow: do not take into account records in overflow bin when True (relevant when binning\
    a numeric variable)
    :param bool verbose: if False, do not print all interval columns that are guessed
    :param int njobs: number of parallel jobs used for calculation of phik. default is -1. 1 uses no parallel jobs.
    :return: phik correlation matrix
    """

    data_binned, binning_dict = auto_bin_data(
        df=df,
        interval_cols=interval_cols,
        bins=bins,
        quantile=quantile,
        dropna=dropna,
        verbose=verbose,
    )
    return phik_from_rebinned_df(
        data_binned,
        noise_correction,
        dropna=dropna,
        drop_underflow=drop_underflow,
        drop_overflow=drop_overflow,
        njobs=njobs,
    )

phik.phik_matrix = phik_matrix_nJobsDefVal
0reactions
aquemycommented, Dec 6, 2022

Hi,

We were not able to reproduce with the current version. My guess is that it is environment related.

The solution proposed above consists in deactivating the call to joblib.Parallel in phik library but does not solve the issue. You might want to report it to PhiK directly: https://github.com/KaveIO/PhiK

Feel free to re-open if you have a way to reproduce consistently.

Read more comments on GitHub >

github_iconTop Results From Across the Web

TerminatedWorkerError · Issue #18 · scikit-learn-contrib/skope ...
joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated. This could ...
Read more >
How do I fix/debug this Multi-Process terminated worker error ...
TerminatedWorkerError : A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while ...
Read more >
TerminatedWorkerError | Data Science and Machine Learning
TerminatedWorkerError : A worker process managed by the executor was unexpectedly terminated. This could be caused by a segmentation fault while calling the ......
Read more >
A worker process managed by the executor was unexpectedly ...
TerminatedWorkerError : A worker process managed by the executor was unexpectedly terminated. Hi. I have deployed a python based flask api ...
Read more >
1837012 – python-pyriemann fails to build with Python 3.9 ...
_exception joblib.externals.loky.process_executor.TerminatedWorkerError: A worker process managed by the executor was unexpectedly terminated.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found