childProcessError when running deduce.partition in a Jupyter Notebook on OSX
See original GitHub issueI am running dedupe in a Jupyter notebook on Mac. When I run this line of code:
groups = deduper.partition(data, threshold=.7)
I get this error at the same place each time, 360000:
INFO:dedupe.blocking:340000, 10.0376252 seconds
INFO:dedupe.blocking:350000, 10.3528052 seconds
INFO:dedupe.blocking:360000, 10.6705842 seconds
---------------------------------------------------------------------------
ChildProcessError Traceback (most recent call last)
in
----> 1 groups = deduper.partition(data, threshold=.7)
~/opt/anaconda3/lib/python3.7/site-packages/dedupe/api.py in partition(self, data, threshold)
168 """
169 pairs = self.pairs(data)
--> 170 pair_scores = self.score(pairs)
171 clusters = self.cluster(pair_scores, threshold)
172
~/opt/anaconda3/lib/python3.7/site-packages/dedupe/api.py in score(self, pairs)
104 self.data_model,
105 self.classifier,
--> 106 self.num_cores)
107 except RuntimeError:
108 raise RuntimeError('''
~/opt/anaconda3/lib/python3.7/site-packages/dedupe/core.py in scoreDuplicates(record_pairs, data_model, classifier, num_cores)
247 result = result_queue.get()
248 if isinstance(result, Exception):
--> 249 raise ChildProcessError
250
251 if result:
ChildProcessError:
it looks like the num_cores setting had something to do with it, I’ve tried with that setting set to None
, 1
, and 2
and all have the same outcome.
I found this issue, which sounded somewhat familiar. So in case it helps here is the output of:
import numpy
print(numpy.__config__.__dict__)
{'__name__': 'numpy.__config__', '__doc__': None, '__package__': 'numpy', '__loader__': <_frozen_importlib_external.SourceFileLoader object at 0x7fa3571b59d0>, '__spec__': ModuleSpec(name='numpy.__config__', loader=<_frozen_importlib_external.SourceFileLoader object at 0x7fa3571b59d0>, origin='/Users/calebkeller/opt/anaconda3/lib/python3.7/site-packages/numpy/__config__.py'), '__file__': '/Users/calebkeller/opt/anaconda3/lib/python3.7/site-packages/numpy/__config__.py', '__cached__': '/Users/calebkeller/opt/anaconda3/lib/python3.7/site-packages/numpy/__pycache__/__config__.cpython-37.pyc', '__builtins__': {'__name__': 'builtins', '__doc__': "Built-in functions, exceptions, and other objects.\n\nNoteworthy: None is the `nil' object; Ellipsis represents `...' in slices.", '__package__': '', '__loader__': <class '_frozen_importlib.BuiltinImporter'>, '__spec__': ModuleSpec(name='builtins', loader=<class '_frozen_importlib.BuiltinImporter'>), '__build_class__': <built-in function __build_class__>, '__import__': <built-in function __import__>, 'abs': <built-in function abs>, 'all': <built-in function all>, 'any': <built-in function any>, 'ascii': <built-in function ascii>, 'bin': <built-in function bin>, 'breakpoint': <built-in function breakpoint>, 'callable': <built-in function callable>, 'chr': <built-in function chr>, 'compile': <built-in function compile>, 'delattr': <built-in function delattr>, 'dir': <built-in function dir>, 'divmod': <built-in function divmod>, 'eval': <built-in function eval>, 'exec': <built-in function exec>, 'format': <built-in function format>, 'getattr': <built-in function getattr>, 'globals': <built-in function globals>, 'hasattr': <built-in function hasattr>, 'hash': <built-in function hash>, 'hex': <built-in function hex>, 'id': <built-in function id>, 'input': <bound method Kernel.raw_input of <ipykernel.ipkernel.IPythonKernel object at 0x7fa356643ed0>>, 'isinstance': <built-in function isinstance>, 'issubclass': <built-in function issubclass>, 'iter': <built-in function iter>, 'len': <built-in function len>, 'locals': <built-in function locals>, 'max': <built-in function max>, 'min': <built-in function min>, 'next': <built-in function next>, 'oct': <built-in function oct>, 'ord': <built-in function ord>, 'pow': <built-in function pow>, 'print': <built-in function print>, 'repr': <built-in function repr>, 'round': <built-in function round>, 'setattr': <built-in function setattr>, 'sorted': <built-in function sorted>, 'sum': <built-in function sum>, 'vars': <built-in function vars>, 'None': None, 'Ellipsis': Ellipsis, 'NotImplemented': NotImplemented, 'False': False, 'True': True, 'bool': <class 'bool'>, 'memoryview': <class 'memoryview'>, 'bytearray': <class 'bytearray'>, 'bytes': <class 'bytes'>, 'classmethod': <class 'classmethod'>, 'complex': <class 'complex'>, 'dict': <class 'dict'>, 'enumerate': <class 'enumerate'>, 'filter': <class 'filter'>, 'float': <class 'float'>, 'frozenset': <class 'frozenset'>, 'property': <class 'property'>, 'int': <class 'int'>, 'list': <class 'list'>, 'map': <class 'map'>, 'object': <class 'object'>, 'range': <class 'range'>, 'reversed': <class 'reversed'>, 'set': <class 'set'>, 'slice': <class 'slice'>, 'staticmethod': <class 'staticmethod'>, 'str': <class 'str'>, 'super': <class 'super'>, 'tuple': <class 'tuple'>, 'type': <class 'type'>, 'zip': <class 'zip'>, '__debug__': True, 'BaseException': <class 'BaseException'>, 'Exception': <class 'Exception'>, 'TypeError': <class 'TypeError'>, 'StopAsyncIteration': <class 'StopAsyncIteration'>, 'StopIteration': <class 'StopIteration'>, 'GeneratorExit': <class 'GeneratorExit'>, 'SystemExit': <class 'SystemExit'>, 'KeyboardInterrupt': <class 'KeyboardInterrupt'>, 'ImportError': <class 'ImportError'>, 'ModuleNotFoundError': <class 'ModuleNotFoundError'>, 'OSError': <class 'OSError'>, 'EnvironmentError': <class 'OSError'>, 'IOError': <class 'OSError'>, 'EOFError': <class 'EOFError'>, 'RuntimeError': <class 'RuntimeError'>, 'RecursionError': <class 'RecursionError'>, 'NotImplementedError': <class 'NotImplementedError'>, 'NameError': <class 'NameError'>, 'UnboundLocalError': <class 'UnboundLocalError'>, 'AttributeError': <class 'AttributeError'>, 'SyntaxError': <class 'SyntaxError'>, 'IndentationError': <class 'IndentationError'>, 'TabError': <class 'TabError'>, 'LookupError': <class 'LookupError'>, 'IndexError': <class 'IndexError'>, 'KeyError': <class 'KeyError'>, 'ValueError': <class 'ValueError'>, 'UnicodeError': <class 'UnicodeError'>, 'UnicodeEncodeError': <class 'UnicodeEncodeError'>, 'UnicodeDecodeError': <class 'UnicodeDecodeError'>, 'UnicodeTranslateError': <class 'UnicodeTranslateError'>, 'AssertionError': <class 'AssertionError'>, 'ArithmeticError': <class 'ArithmeticError'>, 'FloatingPointError': <class 'FloatingPointError'>, 'OverflowError': <class 'OverflowError'>, 'ZeroDivisionError': <class 'ZeroDivisionError'>, 'SystemError': <class 'SystemError'>, 'ReferenceError': <class 'ReferenceError'>, 'MemoryError': <class 'MemoryError'>, 'BufferError': <class 'BufferError'>, 'Warning': <class 'Warning'>, 'UserWarning': <class 'UserWarning'>, 'DeprecationWarning': <class 'DeprecationWarning'>, 'PendingDeprecationWarning': <class 'PendingDeprecationWarning'>, 'SyntaxWarning': <class 'SyntaxWarning'>, 'RuntimeWarning': <class 'RuntimeWarning'>, 'FutureWarning': <class 'FutureWarning'>, 'ImportWarning': <class 'ImportWarning'>, 'UnicodeWarning': <class 'UnicodeWarning'>, 'BytesWarning': <class 'BytesWarning'>, 'ResourceWarning': <class 'ResourceWarning'>, 'ConnectionError': <class 'ConnectionError'>, 'BlockingIOError': <class 'BlockingIOError'>, 'BrokenPipeError': <class 'BrokenPipeError'>, 'ChildProcessError': <class 'ChildProcessError'>, 'ConnectionAbortedError': <class 'ConnectionAbortedError'>, 'ConnectionRefusedError': <class 'ConnectionRefusedError'>, 'ConnectionResetError': <class 'ConnectionResetError'>, 'FileExistsError': <class 'FileExistsError'>, 'FileNotFoundError': <class 'FileNotFoundError'>, 'IsADirectoryError': <class 'IsADirectoryError'>, 'NotADirectoryError': <class 'NotADirectoryError'>, 'InterruptedError': <class 'InterruptedError'>, 'PermissionError': <class 'PermissionError'>, 'ProcessLookupError': <class 'ProcessLookupError'>, 'TimeoutError': <class 'TimeoutError'>, 'open': <built-in function open>, 'copyright': Copyright (c) 2001-2019 Python Software Foundation.
All Rights Reserved.
Copyright (c) 2000 BeOpen.com.
All Rights Reserved.
Copyright (c) 1995-2001 Corporation for National Research Initiatives.
All Rights Reserved.
Copyright (c) 1991-1995 Stichting Mathematisch Centrum, Amsterdam.
All Rights Reserved., 'credits': Thanks to CWI, CNRI, BeOpen.com, Zope Corporation and a cast of thousands
for supporting Python development. See www.python.org for more information., 'license': Type license() to see the full license text, 'help': Type help() for interactive help, or help(object) for help about object., '__IPYTHON__': True, 'display': <function display at 0x7fa355065830>, '__pybind11_internals_v3_clang_libcpp_cxxabi1002__': <capsule object NULL at 0x7fa359712db0>, 'get_ipython': <bound method InteractiveShell.get_ipython of <ipykernel.zmqshell.ZMQInteractiveShell object at 0x7fa356643b90>>}, '__all__': ['get_info', 'show'], 'os': <module 'os' from '/Users/calebkeller/opt/anaconda3/lib/python3.7/os.py'>, 'sys': <module 'sys' (built-in)>, 'extra_dll_dir': '/Users/calebkeller/opt/anaconda3/lib/python3.7/site-packages/numpy/.libs', 'blas_mkl_info': {'libraries': ['mkl_rt', 'pthread'], 'library_dirs': ['/Users/calebkeller/opt/anaconda3/lib'], 'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)], 'include_dirs': ['/Users/calebkeller/opt/anaconda3/include']}, 'blas_opt_info': {'libraries': ['mkl_rt', 'pthread'], 'library_dirs': ['/Users/calebkeller/opt/anaconda3/lib'], 'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)], 'include_dirs': ['/Users/calebkeller/opt/anaconda3/include']}, 'lapack_mkl_info': {'libraries': ['mkl_rt', 'pthread'], 'library_dirs': ['/Users/calebkeller/opt/anaconda3/lib'], 'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)], 'include_dirs': ['/Users/calebkeller/opt/anaconda3/include']}, 'lapack_opt_info': {'libraries': ['mkl_rt', 'pthread'], 'library_dirs': ['/Users/calebkeller/opt/anaconda3/lib'], 'define_macros': [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)], 'include_dirs': ['/Users/calebkeller/opt/anaconda3/include']}, 'get_info': <function get_info at 0x7fa3571b07a0>, 'show': <function show at 0x7fa3571b0dd0>}
Issue Analytics
- State:
- Created 3 years ago
- Comments:15 (6 by maintainers)
Top Results From Across the Web
Run Jupyter notebooks on MacOS for Python - YouTube
In this course, you will get started with Python, by installing and getting started with the software, as well as writing your first ......
Read more >Open Jupyter Notebook from a Drive Other than C Drive
I'm using windows 10 operating system. All the solutions I found, trying to change notebook startup path. python · jupyter-notebook.
Read more >Visualize BigQuery data in Jupyter notebooks
This tutorial describes how to explore and visualize data by using the BigQuery client library for Python and pandas in a managed Jupyter...
Read more >Change Jupyter Notebook startup folder on Windows and Mac ...
In this post "Change Jupyter Notebook startup folder on Windows and Mac OS", we will learn how we can open the notebooks from...
Read more >2 Ways to Install Jupyter Notebook on Windows Easily
This post from MiniTool Partition Wizard shows you how to install Jupyter Notebook step by step in 2 ways. You can have a...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Hi @fjsj, that worked. Thank you so much for all your help. Much appreciated! 😃
Folks, remember to try with a single core, otherwise your real error will be masked by
ChildProcessError
.And if the empty strings in your dataset are not blocked together, they won’t be scored and you won’t see the
ZeroDivisionError
error.