question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

childProcessError when running deduce.partition in a Jupyter Notebook on OSX

See original GitHub issue

I am running dedupe in a Jupyter notebook on Mac. When I run this line of code:

groups = deduper.partition(data, threshold=.7)

I get this error at the same place each time, 360000:

INFO:dedupe.blocking:340000, 10.0376252 seconds
INFO:dedupe.blocking:350000, 10.3528052 seconds
INFO:dedupe.blocking:360000, 10.6705842 seconds
---------------------------------------------------------------------------
ChildProcessError                         Traceback (most recent call last)
 in 
----> 1 groups = deduper.partition(data, threshold=.7)

~/opt/anaconda3/lib/python3.7/site-packages/dedupe/api.py in partition(self, data, threshold)
    168         """
    169         pairs = self.pairs(data)
--> 170         pair_scores = self.score(pairs)
    171         clusters = self.cluster(pair_scores, threshold)
    172 

~/opt/anaconda3/lib/python3.7/site-packages/dedupe/api.py in score(self, pairs)
    104                                            self.data_model,
    105                                            self.classifier,
--> 106                                            self.num_cores)
    107         except RuntimeError:
    108             raise RuntimeError('''

~/opt/anaconda3/lib/python3.7/site-packages/dedupe/core.py in scoreDuplicates(record_pairs, data_model, classifier, num_cores)
    247     result = result_queue.get()
    248     if isinstance(result, Exception):
--> 249         raise ChildProcessError
    250 
    251     if result:

ChildProcessError: 

it looks like the num_cores setting had something to do with it, I’ve tried with that setting set to None, 1, and 2 and all have the same outcome.

I found this issue, which sounded somewhat familiar. So in case it helps here is the output of:

import numpy
print(numpy.__config__.__dict__)
{'__name__': 'numpy.__config__', '__doc__': None, '__package__': 'numpy', '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7fa3571b59d0>, '__spec__': ModuleSpec(name='numpy.__config__', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7fa3571b59d0>, origin='/Users/calebkeller/opt/anaconda3/lib/python3.7/site-packages/numpy/__config__.py'), '__file__': '/Users/calebkeller/opt/anaconda3/lib/python3.7/site-packages/numpy/__config__.py', '__cached__': '/Users/calebkeller/opt/anaconda3/lib/python3.7/site-packages/numpy/__pycache__/__config__.cpython-37.pyc', '__builtins__': {'__name__': 'builtins', '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.", '__package__': '', '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>), '__build_class__': <built-in function __build_class__>, '__import__': <built-in function __import__>, 'abs': <built-in function abs>, 'all': <built-in function all>, 'any': <built-in function any>, 'ascii': <built-in function ascii>, 'bin': <built-in function bin>, 'breakpoint': <built-in function breakpoint>, 'callable': <built-in function callable>, 'chr': <built-in function chr>, 'compile': <built-in function compile>, 'delattr': <built-in function delattr>, 'dir': <built-in function dir>, 'divmod': <built-in function divmod>, 'eval': <built-in function eval>, 'exec': <built-in function exec>, 'format': <built-in function format>, 'getattr': <built-in function getattr>, 'globals': <built-in function globals>, 'hasattr': <built-in function hasattr>, 'hash': <built-in function hash>, 'hex': <built-in function hex>, 'id': <built-in function id>, 'input': <bound method Kernel.raw_input of <ipykernel.ipkernel.IPythonKernel object at 0x7fa356643ed0>>, 'isinstance': <built-in function isinstance>, 'issubclass': <built-in function issubclass>, 'iter': <built-in function iter>, 'len': <built-in function len>, 'locals': <built-in function locals>, 'max': <built-in function max>, 'min': <built-in function min>, 'next': <built-in function next>, 'oct': <built-in function oct>, 'ord': <built-in function ord>, 'pow': <built-in function pow>, 'print': <built-in function print>, 'repr': <built-in function repr>, 'round': <built-in function round>, 'setattr': <built-in function setattr>, 'sorted': <built-in function sorted>, 'sum': <built-in function sum>, 'vars': <built-in function vars>, 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': <class 'bool'>, 'memoryview': <class 'memoryview'>, 'bytearray': <class 'bytearray'>, 'bytes': <class 'bytes'>, 'classmethod': <class 'classmethod'>, 'complex': <class 'complex'>, 'dict': <class 'dict'>, 'enumerate': <class 'enumerate'>, 'filter': <class 'filter'>, 'float': <class 'float'>, 'frozenset': <class 'frozenset'>, 'property': <class 'property'>, 'int': <class 'int'>, 'list': <class 'list'>, 'map': <class 'map'>, 'object': <class 'object'>, 'range': <class 'range'>, 'reversed': <class 'reversed'>, 'set': <class 'set'>, 'slice': <class 'slice'>, 'staticmethod': <class 'staticmethod'>, 'str': <class 'str'>, 'super': <class 'super'>, 'tuple': <class 'tuple'>, 'type': <class 'type'>, 'zip': <class 'zip'>, '__debug__': True, 'BaseException': <class 'BaseException'>, 'Exception': <class 'Exception'>, 'TypeError': <class 'TypeError'>, 'StopAsyncIteration': <class 'StopAsyncIteration'>, 'StopIteration': <class 'StopIteration'>, 'GeneratorExit': <class 'GeneratorExit'>, 'SystemExit': <class 'SystemExit'>, 'KeyboardInterrupt': <class 'KeyboardInterrupt'>, 'ImportError': <class 'ImportError'>, 'ModuleNotFoundError': <class 'ModuleNotFoundError'>, 'OSError': <class 'OSError'>, 'EnvironmentError': <class 'OSError'>, 'IOError': <class 'OSError'>, 'EOFError': <class 'EOFError'>, 'RuntimeError': <class 'RuntimeError'>, 'RecursionError': <class 'RecursionError'>, 'NotImplementedError': <class 'NotImplementedError'>, 'NameError': <class 'NameError'>, 'UnboundLocalError': <class 'UnboundLocalError'>, 'AttributeError': <class 'AttributeError'>, 'SyntaxError': <class 'SyntaxError'>, 'IndentationError': <class 'IndentationError'>, 'TabError': <class 'TabError'>, 'LookupError': <class 'LookupError'>, 'IndexError': <class 'IndexError'>, 'KeyError': <class 'KeyError'>, 'ValueError': <class 'ValueError'>, 'UnicodeError': <class 'UnicodeError'>, 'UnicodeEncodeError': <class 'UnicodeEncodeError'>, 'UnicodeDecodeError': <class 'UnicodeDecodeError'>, 'UnicodeTranslateError': <class 'UnicodeTranslateError'>, 'AssertionError': <class 'AssertionError'>, 'ArithmeticError': <class 'ArithmeticError'>, 'FloatingPointError': <class 'FloatingPointError'>, 'OverflowError': <class 'OverflowError'>, 'ZeroDivisionError': <class 'ZeroDivisionError'>, 'SystemError': <class 'SystemError'>, 'ReferenceError': <class 'ReferenceError'>, 'MemoryError': <class 'MemoryError'>, 'BufferError': <class 'BufferError'>, 'Warning': <class 'Warning'>, 'UserWarning': <class 'UserWarning'>, 'DeprecationWarning': <class 'DeprecationWarning'>, 'PendingDeprecationWarning': <class 'PendingDeprecationWarning'>, 'SyntaxWarning': <class 'SyntaxWarning'>, 'RuntimeWarning': <class 'RuntimeWarning'>, 'FutureWarning': <class 'FutureWarning'>, 'ImportWarning': <class 'ImportWarning'>, 'UnicodeWarning': <class 'UnicodeWarning'>, 'BytesWarning': <class 'BytesWarning'>, 'ResourceWarning': <class 'ResourceWarning'>, 'ConnectionError': <class 'ConnectionError'>, 'BlockingIOError': <class 'BlockingIOError'>, 'BrokenPipeError': <class 'BrokenPipeError'>, 'ChildProcessError': <class 'ChildProcessError'>, 'ConnectionAbortedError': <class 'ConnectionAbortedError'>, 'ConnectionRefusedError': <class 'ConnectionRefusedError'>, 'ConnectionResetError': <class 'ConnectionResetError'>, 'FileExistsError': <class 'FileExistsError'>, 'FileNotFoundError': <class 'FileNotFoundError'>, 'IsADirectoryError': <class 'IsADirectoryError'>, 'NotADirectoryError': <class 'NotADirectoryError'>, 'InterruptedError': <class 'InterruptedError'>, 'PermissionError': <class 'PermissionError'>, 'ProcessLookupError': <class 'ProcessLookupError'>, 'TimeoutError': <class 'TimeoutError'>, 'open': <built-in function open>, 'copyright': Copyright (c) 2001-2019 Python Software Foundation.
All Rights Reserved.

Copyright (c) 2000 BeOpen.com.
All Rights Reserved.

Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.

Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved., 'credits':     Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
    for supporting Python development.  See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object., '__IPYTHON__': True, 'display': <function display at 0x7fa355065830>, '__pybind11_internals_v3_clang_libcpp_cxxabi1002__': <capsule object NULL at 0x7fa359712db0>, 'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7fa356643b90>>}, '__all__': ['get_info', 'show'], 'os': <module 'os' from '/Users/calebkeller/opt/anaconda3/lib/python3.7/os.py'>, 'sys': <module 'sys' (built-in)>, 'extra_dll_dir': '/Users/calebkeller/opt/anaconda3/lib/python3.7/site-packages/numpy/.libs', 'blas_mkl_info': {'libraries': ['mkl_rt', 'pthread'], 'library_dirs': ['/Users/calebkeller/opt/anaconda3/lib'], 'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)], 'include_dirs': ['/Users/calebkeller/opt/anaconda3/include']}, 'blas_opt_info': {'libraries': ['mkl_rt', 'pthread'], 'library_dirs': ['/Users/calebkeller/opt/anaconda3/lib'], 'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)], 'include_dirs': ['/Users/calebkeller/opt/anaconda3/include']}, 'lapack_mkl_info': {'libraries': ['mkl_rt', 'pthread'], 'library_dirs': ['/Users/calebkeller/opt/anaconda3/lib'], 'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)], 'include_dirs': ['/Users/calebkeller/opt/anaconda3/include']}, 'lapack_opt_info': {'libraries': ['mkl_rt', 'pthread'], 'library_dirs': ['/Users/calebkeller/opt/anaconda3/lib'], 'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)], 'include_dirs': ['/Users/calebkeller/opt/anaconda3/include']}, 'get_info': <function get_info at 0x7fa3571b07a0>, 'show': <function show at 0x7fa3571b0dd0>}

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:15 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
marianadplcommented, Oct 21, 2020

Hi @fjsj, that worked. Thank you so much for all your help. Much appreciated! 😃

1reaction
fjsjcommented, Aug 28, 2020

Folks, remember to try with a single core, otherwise your real error will be masked by ChildProcessError.

And if the empty strings in your dataset are not blocked together, they won’t be scored and you won’t see the ZeroDivisionError error.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Run Jupyter notebooks on MacOS for Python - YouTube
In this course, you will get started with Python, by installing and getting started with the software, as well as writing your first ......
Read more >
Open Jupyter Notebook from a Drive Other than C Drive
I'm using windows 10 operating system. All the solutions I found, trying to change notebook startup path. python · jupyter-notebook.
Read more >
Visualize BigQuery data in Jupyter notebooks
This tutorial describes how to explore and visualize data by using the BigQuery client library for Python and pandas in a managed Jupyter...
Read more >
Change Jupyter Notebook startup folder on Windows and Mac ...
In this post "Change Jupyter Notebook startup folder on Windows and Mac OS", we will learn how we can open the notebooks from...
Read more >
2 Ways to Install Jupyter Notebook on Windows Easily
This post from MiniTool Partition Wizard shows you how to install Jupyter Notebook step by step in 2 ways. You can have a...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found