Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

support for nogil and parallel in AOT functions

See original GitHub issue

With the method I mentioned in this reply: https://github.com/numba/numba/issues/6382#issuecomment-972592827 I actually can generate AOT codes with GIL unlocked. So the first feature request here is to expose option for this compile flag.

If I try to enable parallel in the same way, the compilation finished though, but I cannot import it:

Traceback (most recent call last):
  File "/home/auderson/miniconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-ab29907692a6>", line 1, in <module>
    import aot_functions2 as aot2
ImportError: /home/auderson/aot_functions2.cpython-38-x86_64-linux-gnu.so: undefined symbol: numba_parallel_for

Is there any plan to add support for parallel option? related issue https://github.com/numba/numba/issues/3336

codes here:

import os
from distutils import log

import numpy as np
from llvmlite.binding import Linkage
import llvmlite.llvmpy.core as lc

from numba import njit, prange, float64
from numba.core import cpu
from numba.pycc import CC
from numba.pycc.compiler import (ModuleCompiler, global_compiler_lock, Flags, nrtdynmod, compile_extra
                                 )


class ModuleCompilerMod(ModuleCompiler):
    def __init__(self, export_entries, module_name, use_nrt=False, flags=None, **aot_options):
        super().__init__(export_entries, module_name, use_nrt=use_nrt, **aot_options)
        self.flags = flags or {}

    @global_compiler_lock
    def _cull_exports(self):
        """Read all the exported functions/modules in the translator
        environment, and join them into a single LLVM module.
        """
        self.exported_function_types = {}
        self.function_environments = {}
        self.environment_gvs = {}

        codegen = self.context.codegen()
        library = codegen.create_library(self.module_name)

        # Generate IR for all exported functions
        flags = Flags()
        if self.flags['nogil']:
            flags.release_gil = True
        if self.flags['parallel']:
            flags.auto_parallel = cpu.ParallelOptions(True)
        flags.no_compile = True
        if not self.export_python_wrap:
            flags.no_cpython_wrapper = True
            flags.no_cfunc_wrapper = True
        if self.use_nrt:
            flags.nrt = True
            # Compile NRT helpers
            nrt_module, _ = nrtdynmod.create_nrt_module(self.context)
            library.add_ir_module(nrt_module)

        for entry in self.export_entries:
            cres = compile_extra(self.typing_context, self.context,
                                 entry.function,
                                 entry.signature.args,
                                 entry.signature.return_type, flags,
                                 locals={}, library=library)

            func_name = cres.fndesc.llvm_func_name
            llvm_func = cres.library.get_function(func_name)

            if self.export_python_wrap:
                llvm_func.linkage = lc.LINKAGE_INTERNAL
                wrappername = cres.fndesc.llvm_cpython_wrapper_name
                wrapper = cres.library.get_function(wrappername)
                wrapper.name = self._mangle_method_symbol(entry.symbol)
                wrapper.linkage = lc.LINKAGE_EXTERNAL
                fnty = cres.target_context.call_conv.get_function_type(
                    cres.fndesc.restype, cres.fndesc.argtypes)
                self.exported_function_types[entry] = fnty
                self.function_environments[entry] = cres.environment
                self.environment_gvs[entry] = cres.fndesc.env_name
            else:
                llvm_func.name = entry.symbol
                self.dll_exports.append(entry.symbol)

        if self.export_python_wrap:
            wrapper_module = library.create_ir_module("wrapper")
            self._emit_python_wrapper(wrapper_module)
            library.add_ir_module(wrapper_module)

        # Hide all functions in the DLL except those explicitly exported
        library.finalize()
        for fn in library.get_defined_functions():
            if fn.name not in self.dll_exports:
                if fn.linkage in {Linkage.private, Linkage.internal}:
                    # Private/Internal linkage must have "default" visibility
                    fn.visibility = "default"
                else:
                    fn.visibility = 'hidden'
        return library


class CCMod(CC):
    def __init__(self, extension_name, source_module=None, **flags):
        super().__init__(extension_name, source_module=source_module)
        self.flags = flags

    @global_compiler_lock
    def _compile_object_files(self, build_dir):
        compiler = ModuleCompilerMod(self._export_entries, self._basename,
                                     self._use_nrt, flags=self.flags, cpu_name=self._target_cpu)

        compiler.external_init_function = self._init_function
        temp_obj = os.path.join(build_dir,
                                os.path.splitext(self._output_file)[0] + '.o')
        log.info("generating LLVM code for '%s' into %s",
                 self._basename, temp_obj)
        compiler.write_native_object(temp_obj, wrap=True)

        return [temp_obj], compiler.dll_exports


ext_name = 'aot_functions'

cc_1 = CCMod(f'{ext_name}1', nogil=True, parallel=False)
cc_2 = CCMod(f'{ext_name}2', nogil=True, parallel=True)

# Uncomment the following line to print out the compilation steps
cc_1.verbose = True
cc_2.verbose = True


def foo(x):
    n, m = x.shape
    out = np.empty(n)
    for i in prange(n):
        out[i] = np.mean(x[i])
    return out


f1 = cc_1.export('foo', float64[:](float64[:, :]))(foo)
f2 = cc_2.export('foo', float64[:](float64[:, :]))(foo)

cc_1.compile()
cc_2.compile()

import aot_functions1 as aot1

_ = aot1.foo(np.random.rand(10, 10))

import aot_functions2 as aot2

"""
Traceback (most recent call last):
  File "/home/auderson/miniconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-5-ab29907692a6>", line 1, in <module>
    import aot_functions2 as aot2
ImportError: /home/auderson/aot_functions2.cpython-38-x86_64-linux-gnu.so: undefined symbol: numba_parallel_for
"""

Issue Analytics

State:
Created 2 years ago
Comments:6 (5 by maintainers)

Top GitHub Comments

1reaction

audersoncommented, Jan 11, 2022

@gmarkall Thank you so much for taking the time to discuss these FR! I’ll follow up in #6382 and #3336 after closing this issue.

0reactions

gmarkallcommented, Jan 10, 2022

@auderson Many thanks for the request - we discussed this in the triage meeting today and there are two conclusions:

For exposing the nogil flag for AOT compilation, this should be doable and we can track that in #6382.
For supporting parallel in AOT code, this is not likely to be supported as AOT code doesn’t require Numba to be installed - this would mean a great deal of work and issues to overcome for packaging the code for the threading layers. This already seems to be tracked in #3336.

Are there any separate points to discuss in this issue beyond those in #6382 and #3336? If not, would you be happy with us closing this issue and using the other two to track progress on these items?

Many thanks!

Top Results From Across the Web

Parallel compilation of several numba functions - Stack Overflow

Limitations : 1) AOT compilation only allows for regular functions, not ufuncs . 2) You have to specify function signatures explicitly.

Compiling code ahead of time - Numba

AOT compilation only allows for regular functions, not ufuncs. You have to specify function signatures explicitly. Each exported function can have only one ......

Speed Up your Algorithms Part 2— Numba | by Puneet Grover

With Numba, you can speed up all of your calculation focused and computationally heavy python functions(eg loops). It also has support for ...

Types and signatures - Numba documentation

First-class function support is enabled for all Numba JIT compiled functions and Numba cfunc compiled functions except when: using a non-CPU compiler,. the ......

How to make Python Faster. Part 2: Numba, Numpy, OpenMP ...

float64_t, ndim=2] ), but they have more features and cleaner syntax. Cython and OpenMP. OpenMP is a Open Multi-Processing API, it supports parallel...