question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Fit on array of size >= 4096 returns _TracedPicklingError: Failed in nopython mode pipeline (step: nopython mode backend)

See original GitHub issue

A simple fit of umap on a random numpy array of size 4096 (or more) fails with the traceback detailed below. If the array is of size less than 4096 everything goes fine. What’s going wrong? A very similar issue: https://github.com/lmcinnes/umap/issues/477

Framework: Spark-based environment with umap==0.5.0, numba==0.52.0, pynndescent==0.5.1, scipy==1.4.1

Minimal example:

import umap
import numpy as np
reducer = umap.UMAP() 
np.random.seed(0)
reducer.fit(np.random.rand(4096,16))

This gives the following error:

---------------------------------------------------------------------------
PicklingError                             Traceback (most recent call last)
/databricks/python/lib/python3.7/site-packages/numba/core/serialize.py in save(self, obj)
    304         try:
--> 305             return super().save(obj)
    306         except _TracedPicklingError:

/databricks/python/lib/python3.7/pickle.py in save(self, obj, save_persistent_id)
    503         if f is not None:
--> 504             f(self, obj) # Call unbound method with explicit self
    505             return

/databricks/python/lib/python3.7/pickle.py in save_type(self, obj)
   1015             return self.save_reduce(type, (...,), obj=obj)
-> 1016         return self.save_global(obj)
   1017 

/databricks/python/lib/python3.7/pickle.py in save_global(self, obj, name)
    959                 "Can't pickle %r: it's not found as %s.%s" %
--> 960                 (obj, module_name, name)) from None
    961         else:

PicklingError: Can't pickle <class 'collections.FlatTree'>: it's not found as collections.FlatTree

During handling of the above exception, another exception occurred:

_TracedPicklingError                      Traceback (most recent call last)
<command-103842> in <module>
      3 reducer = umap.UMAP()
      4 np.random.seed(0)
----> 5 reducer.fit(np.random.rand(4096,16))

/databricks/python/lib/python3.7/site-packages/umap/umap_.py in fit(self, X, y)
   2387                 use_pynndescent=True,
   2388                 n_jobs=self.n_jobs,
-> 2389                 verbose=self.verbose,
   2390             )
   2391 

/databricks/python/lib/python3.7/site-packages/umap/umap_.py in nearest_neighbors(X, n_neighbors, metric, metric_kwds, angular, random_state, low_memory, use_pynndescent, n_jobs, verbose)
    337             low_memory=low_memory,
    338             n_jobs=n_jobs,
--> 339             verbose=verbose,
    340         )
    341         knn_indices, knn_dists = knn_search_index.neighbor_graph

/databricks/python/lib/python3.7/site-packages/pynndescent/pynndescent_.py in __init__(self, data, metric, metric_kwds, n_neighbors, n_trees, leaf_size, pruning_degree_multiplier, diversify_prob, n_search_trees, tree_init, init_graph, random_state, low_memory, max_candidates, n_iters, delta, n_jobs, compressed, verbose)
    789                 current_random_state,
    790                 self.n_jobs,
--> 791                 self._angular_trees,
    792             )
    793             leaf_array = rptree_leaf_array(self._rp_forest)

/databricks/python/lib/python3.7/site-packages/pynndescent/rp_trees.py in make_forest(data, n_neighbors, n_trees, leaf_size, rng_state, random_state, n_jobs, angular)
    999             result = joblib.Parallel(n_jobs=n_jobs, prefer="threads")(
   1000                 joblib.delayed(make_dense_tree)(data, rng_states[i], leaf_size, angular)
-> 1001                 for i in range(n_trees)
   1002             )
   1003     except (RuntimeError, RecursionError, SystemError):

/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in __call__(self, iterable)
   1015 
   1016             with self._backend.retrieval_context():
-> 1017                 self.retrieve()
   1018             # Make sure that we get a last message telling us we are done
   1019             elapsed_time = time.time() - self._start_time

/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in retrieve(self)
    907             try:
    908                 if getattr(self._backend, 'supports_timeout', False):
--> 909                     self._output.extend(job.get(timeout=self.timeout))
    910                 else:
    911                     self._output.extend(job.get())

/databricks/python/lib/python3.7/multiprocessing/pool.py in get(self, timeout)
    655             return self._value
    656         else:
--> 657             raise self._value
    658 
    659     def _set(self, i, obj):

/databricks/python/lib/python3.7/multiprocessing/pool.py in worker(inqueue, outqueue, initializer, initargs, maxtasks, wrap_exception)
    119         job, i, func, args, kwds = task
    120         try:
--> 121             result = (True, func(*args, **kwds))
    122         except Exception as e:
    123             if wrap_exception and func is not _helper_reraises_exception:

/databricks/python/lib/python3.7/site-packages/joblib/_parallel_backends.py in __call__(self, *args, **kwargs)
    606     def __call__(self, *args, **kwargs):
    607         try:
--> 608             return self.func(*args, **kwargs)
    609         except KeyboardInterrupt:
    610             # We capture the KeyboardInterrupt and reraise it as

/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in __call__(self)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/databricks/python/lib/python3.7/site-packages/joblib/parallel.py in <listcomp>(.0)
    254         with parallel_backend(self._backend, n_jobs=self._n_jobs):
    255             return [func(*args, **kwargs)
--> 256                     for func, args, kwargs in self.items]
    257 
    258     def __len__(self):

/databricks/python/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
    431                     e.patch_message('\n'.join((str(e).rstrip(), help_msg)))
    432             # ignore the FULL_TRACEBACKS config, this needs reporting!
--> 433             raise e
    434 
    435     def inspect_llvm(self, signature=None):

/databricks/python/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_for_args(self, *args, **kws)
    364                 argtypes.append(self.typeof_pyval(a))
    365         try:
--> 366             return self.compile(tuple(argtypes))
    367         except errors.ForceLiteralArg as e:
    368             # Received request for compiler re-entry with the list of arguments

/databricks/python/lib/python3.7/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

/databricks/python/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, sig)
    855             self._cache_misses[sig] += 1
    856             try:
--> 857                 cres = self._compiler.compile(args, return_type)
    858             except errors.ForceLiteralArg as e:
    859                 def folded(args, kws):

/databricks/python/lib/python3.7/site-packages/numba/core/dispatcher.py in compile(self, args, return_type)
     75 
     76     def compile(self, args, return_type):
---> 77         status, retval = self._compile_cached(args, return_type)
     78         if status:
     79             return retval

/databricks/python/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_cached(self, args, return_type)
     89 
     90         try:
---> 91             retval = self._compile_core(args, return_type)
     92         except errors.TypingError as e:
     93             self._failed_cache[key] = e

/databricks/python/lib/python3.7/site-packages/numba/core/dispatcher.py in _compile_core(self, args, return_type)
    107                                       args=args, return_type=return_type,
    108                                       flags=flags, locals=self.locals,
--> 109                                       pipeline_class=self.pipeline_class)
    110         # Check typing error if object mode is used
    111         if cres.typing_error is not None and not flags.enable_pyobject:

/databricks/python/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(typingctx, targetctx, func, args, return_type, flags, locals, library, pipeline_class)
    600     pipeline = pipeline_class(typingctx, targetctx, library,
    601                               args, return_type, flags, locals)
--> 602     return pipeline.compile_extra(func)
    603 
    604 

/databricks/python/lib/python3.7/site-packages/numba/core/compiler.py in compile_extra(self, func)
    350         self.state.lifted = ()
    351         self.state.lifted_from = None
--> 352         return self._compile_bytecode()
    353 
    354     def compile_ir(self, func_ir, lifted=(), lifted_from=None):

/databricks/python/lib/python3.7/site-packages/numba/core/compiler.py in _compile_bytecode(self)
    412         """
    413         assert self.state.func_ir is None
--> 414         return self._compile_core()
    415 
    416     def _compile_ir(self):

/databricks/python/lib/python3.7/site-packages/numba/core/compiler.py in _compile_core(self)
    392                 self.state.status.fail_reason = e
    393                 if is_final_pipeline:
--> 394                     raise e
    395         else:
    396             raise CompilerError("All available pipelines exhausted")

/databricks/python/lib/python3.7/site-packages/numba/core/compiler.py in _compile_core(self)
    383             res = None
    384             try:
--> 385                 pm.run(self.state)
    386                 if self.state.cr is not None:
    387                     break

/databricks/python/lib/python3.7/site-packages/numba/core/compiler_machinery.py in run(self, state)
    337                     (self.pipeline_name, pass_desc)
    338                 patched_exception = self._patch_error(msg, e)
--> 339                 raise patched_exception
    340 
    341     def dependency_analysis(self):

/databricks/python/lib/python3.7/site-packages/numba/core/compiler_machinery.py in run(self, state)
    328                 pass_inst = _pass_registry.get(pss).pass_inst
    329                 if isinstance(pass_inst, CompilerPass):
--> 330                     self._runPass(idx, pass_inst, state)
    331                 else:
    332                     raise BaseException("Legacy pass in use")

/databricks/python/lib/python3.7/site-packages/numba/core/compiler_lock.py in _acquire_compile_lock(*args, **kwargs)
     30         def _acquire_compile_lock(*args, **kwargs):
     31             with self:
---> 32                 return func(*args, **kwargs)
     33         return _acquire_compile_lock
     34 

/databricks/python/lib/python3.7/site-packages/numba/core/compiler_machinery.py in _runPass(self, index, pss, internal_state)
    287             mutated |= check(pss.run_initialization, internal_state)
    288         with SimpleTimer() as pass_time:
--> 289             mutated |= check(pss.run_pass, internal_state)
    290         with SimpleTimer() as finalize_time:
    291             mutated |= check(pss.run_finalizer, internal_state)

/databricks/python/lib/python3.7/site-packages/numba/core/compiler_machinery.py in check(func, compiler_state)
    260 
    261         def check(func, compiler_state):
--> 262             mangled = func(compiler_state)
    263             if mangled not in (True, False):
    264                 msg = ("CompilerPass implementations should return True/False. "

/databricks/python/lib/python3.7/site-packages/numba/core/typed_passes.py in run_pass(self, state)
    447 
    448         # TODO: Pull this out into the pipeline
--> 449         NativeLowering().run_pass(state)
    450         lowered = state['cr']
    451         signature = typing.signature(state.return_type, *state.args)

/databricks/python/lib/python3.7/site-packages/numba/core/typed_passes.py in run_pass(self, state)
    373                 lower.lower()
    374                 if not flags.no_cpython_wrapper:
--> 375                     lower.create_cpython_wrapper(flags.release_gil)
    376 
    377                 if not flags.no_cfunc_wrapper:

/databricks/python/lib/python3.7/site-packages/numba/core/lowering.py in create_cpython_wrapper(self, release_gil)
    242         self.context.create_cpython_wrapper(self.library, self.fndesc,
    243                                             self.env, self.call_helper,
--> 244                                             release_gil=release_gil)
    245 
    246     def create_cfunc_wrapper(self):

/databricks/python/lib/python3.7/site-packages/numba/core/cpu.py in create_cpython_wrapper(self, library, fndesc, env, call_helper, release_gil)
    160                                 fndesc, env, call_helper=call_helper,
    161                                 release_gil=release_gil)
--> 162         builder.build()
    163         library.add_ir_module(wrapper_module)
    164 

/databricks/python/lib/python3.7/site-packages/numba/core/callwrapper.py in build(self)
    120 
    121         api = self.context.get_python_api(builder)
--> 122         self.build_wrapper(api, builder, closure, args, kws)
    123 
    124         return wrapper, api

/databricks/python/lib/python3.7/site-packages/numba/core/callwrapper.py in build_wrapper(self, api, builder, closure, args, kws)
    185 
    186             retty = self._simplified_return_type()
--> 187             obj = api.from_native_return(retty, retval, env_manager)
    188             builder.ret(obj)
    189 

/databricks/python/lib/python3.7/site-packages/numba/core/pythonapi.py in from_native_return(self, typ, val, env_manager)
   1387                                                     "prevented the return of " \
   1388                                                     "optional value"
-> 1389         out = self.from_native_value(typ, val, env_manager)
   1390         return out
   1391 

/databricks/python/lib/python3.7/site-packages/numba/core/pythonapi.py in from_native_value(self, typ, val, env_manager)
   1401 
   1402         c = _BoxContext(self.context, self.builder, self, env_manager)
-> 1403         return impl(typ, val, c)
   1404 
   1405     def reflect_native_value(self, typ, val, env_manager=None):

/databricks/python/lib/python3.7/site-packages/numba/core/boxing.py in box_namedtuple(typ, val, c)
    502     Convert native array or structure *val* to a namedtuple object.
    503     """
--> 504     cls_obj = c.pyapi.unserialize(c.pyapi.serialize_object(typ.instance_class))
    505     tuple_obj = box_tuple(typ, val, c)
    506     obj = c.pyapi.call(cls_obj, tuple_obj)

/databricks/python/lib/python3.7/site-packages/numba/core/pythonapi.py in serialize_object(self, obj)
   1362             gv = self.module.__serialized[obj]
   1363         except KeyError:
-> 1364             struct = self.serialize_uncached(obj)
   1365             name = ".const.picklebuf.%s" % (id(obj) if config.DIFF_IR == 0 else "DIFF_IR")
   1366             gv = self.context.insert_unique_const(self.module, name, struct)

/databricks/python/lib/python3.7/site-packages/numba/core/pythonapi.py in serialize_uncached(self, obj)
   1333         """
   1334         # First make the array constant
-> 1335         data = serialize.dumps(obj)
   1336         assert len(data) < 2**31
   1337         name = ".const.pickledata.%s" % (id(obj) if config.DIFF_IR == 0 else "DIFF_IR")

/databricks/python/lib/python3.7/site-packages/numba/core/serialize.py in dumps(obj)
    166     with io.BytesIO() as buf:
    167         p = pickler(buf)
--> 168         p.dump(obj)
    169         pickled = buf.getvalue()
    170 

/databricks/python/lib/python3.7/pickle.py in dump(self, obj)
    435         if self.proto >= 4:
    436             self.framer.start_framing()
--> 437         self.save(obj)
    438         self.write(STOP)
    439         self.framer.end_framing()

/databricks/python/lib/python3.7/site-packages/numba/core/serialize.py in save(self, obj)
    312             m = (f"Failed to pickle because of\n  {type(e).__name__}: {e}"
    313                  f"\ntracing... \n{perline(self.__trace)}")
--> 314             raise _TracedPicklingError(m)
    315         finally:
    316             self.__trace.pop()

_TracedPicklingError: Failed in nopython mode pipeline (step: nopython mode backend)
Failed to pickle because of
  PicklingError: Can't pickle <class 'collections.FlatTree'>: it's not found as collections.FlatTree
tracing... 
 [0]: <class 'type'>: 94412180539632

Issue Analytics

  • State:open
  • Created 3 years ago
  • Reactions:1
  • Comments:8

github_iconTop GitHub Comments

2reactions
oscarorticommented, Feb 17, 2022

Hello 🙋‍♂️

I finally found a hack. It seems that there is some weird internal overriding that makes the collections.FlatTree class not pickeable.

# This is a hack to be able to use UMAP.fit_transform with more than 4095 samples.
# See links below:
# https://github.com/lmcinnes/umap/issues/477
# https://github.com/lmcinnes/umap/issues/547
import collections
import pynndescent

collections.namedtuple("n", [], module=__name__)
pynndescent.rp_trees.FlatTree.__module__ = "pynndescent.rp_trees"

I hope this could help you!

1reaction
asif7adilcommented, Feb 17, 2022

Just copy-paste below the module’s imports.

It is working like charm now. 😁 Thank you so much @oscarorti tried it on 5k pbmc for now, will keep the community updated for 68k

Read more comments on GitHub >

github_iconTop Results From Across the Web

Failed in nopython mode pipeline (step - Stack Overflow
The main problem is TypingError: invalid setitem with value of array(int32, 1d, C) to element of array(int64, 1d, C) in function ...
Read more >
numba/numba - Gitter
TypingError: Failed in nopython mode pipeline (step: nopython frontend) Use of unsupported NumPy function 'numpy.array' or unsupported use of the function.
Read more >
Pp.neighbors 'nopython mode pipeline error' - Help - Scanpy
When i try to use it I get an error saying: 'LoweringError: Failed in nopython mode pipeline (step: nopython mode backend)
Read more >
[Solved]-TypingError: Failed in nopython mode pipeline (step
Other than that, the only problem was the initialization of interval . Numba doesn't want you to pass a numpy array to a...
Read more >
LoweringError: Failed in nopython mode pipeline (step - 简书
<pre style="box-sizing: border-box; overflow: auto; font-family: monospace; font-size: inherit; display: block; padding: 1px 0px; ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found