support for nogil and parallel in AOT functions
See original GitHub issueWith the method I mentioned in this reply: https://github.com/numba/numba/issues/6382#issuecomment-972592827 I actually can generate AOT codes with GIL unlocked. So the first feature request here is to expose option for this compile flag.
If I try to enable parallel in the same way, the compilation finished though, but I cannot import it:
Traceback (most recent call last):
File "/home/auderson/miniconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-ab29907692a6>", line 1, in <module>
import aot_functions2 as aot2
ImportError: /home/auderson/aot_functions2.cpython-38-x86_64-linux-gnu.so: undefined symbol: numba_parallel_for
Is there any plan to add support for parallel option? related issue https://github.com/numba/numba/issues/3336
codes here:
import os
from distutils import log
import numpy as np
from llvmlite.binding import Linkage
import llvmlite.llvmpy.core as lc
from numba import njit, prange, float64
from numba.core import cpu
from numba.pycc import CC
from numba.pycc.compiler import (ModuleCompiler, global_compiler_lock, Flags, nrtdynmod, compile_extra
)
class ModuleCompilerMod(ModuleCompiler):
def __init__(self, export_entries, module_name, use_nrt=False, flags=None, **aot_options):
super().__init__(export_entries, module_name, use_nrt=use_nrt, **aot_options)
self.flags = flags or {}
@global_compiler_lock
def _cull_exports(self):
"""Read all the exported functions/modules in the translator
environment, and join them into a single LLVM module.
"""
self.exported_function_types = {}
self.function_environments = {}
self.environment_gvs = {}
codegen = self.context.codegen()
library = codegen.create_library(self.module_name)
# Generate IR for all exported functions
flags = Flags()
if self.flags['nogil']:
flags.release_gil = True
if self.flags['parallel']:
flags.auto_parallel = cpu.ParallelOptions(True)
flags.no_compile = True
if not self.export_python_wrap:
flags.no_cpython_wrapper = True
flags.no_cfunc_wrapper = True
if self.use_nrt:
flags.nrt = True
# Compile NRT helpers
nrt_module, _ = nrtdynmod.create_nrt_module(self.context)
library.add_ir_module(nrt_module)
for entry in self.export_entries:
cres = compile_extra(self.typing_context, self.context,
entry.function,
entry.signature.args,
entry.signature.return_type, flags,
locals={}, library=library)
func_name = cres.fndesc.llvm_func_name
llvm_func = cres.library.get_function(func_name)
if self.export_python_wrap:
llvm_func.linkage = lc.LINKAGE_INTERNAL
wrappername = cres.fndesc.llvm_cpython_wrapper_name
wrapper = cres.library.get_function(wrappername)
wrapper.name = self._mangle_method_symbol(entry.symbol)
wrapper.linkage = lc.LINKAGE_EXTERNAL
fnty = cres.target_context.call_conv.get_function_type(
cres.fndesc.restype, cres.fndesc.argtypes)
self.exported_function_types[entry] = fnty
self.function_environments[entry] = cres.environment
self.environment_gvs[entry] = cres.fndesc.env_name
else:
llvm_func.name = entry.symbol
self.dll_exports.append(entry.symbol)
if self.export_python_wrap:
wrapper_module = library.create_ir_module("wrapper")
self._emit_python_wrapper(wrapper_module)
library.add_ir_module(wrapper_module)
# Hide all functions in the DLL except those explicitly exported
library.finalize()
for fn in library.get_defined_functions():
if fn.name not in self.dll_exports:
if fn.linkage in {Linkage.private, Linkage.internal}:
# Private/Internal linkage must have "default" visibility
fn.visibility = "default"
else:
fn.visibility = 'hidden'
return library
class CCMod(CC):
def __init__(self, extension_name, source_module=None, **flags):
super().__init__(extension_name, source_module=source_module)
self.flags = flags
@global_compiler_lock
def _compile_object_files(self, build_dir):
compiler = ModuleCompilerMod(self._export_entries, self._basename,
self._use_nrt, flags=self.flags, cpu_name=self._target_cpu)
compiler.external_init_function = self._init_function
temp_obj = os.path.join(build_dir,
os.path.splitext(self._output_file)[0] + '.o')
log.info("generating LLVM code for '%s' into %s",
self._basename, temp_obj)
compiler.write_native_object(temp_obj, wrap=True)
return [temp_obj], compiler.dll_exports
ext_name = 'aot_functions'
cc_1 = CCMod(f'{ext_name}1', nogil=True, parallel=False)
cc_2 = CCMod(f'{ext_name}2', nogil=True, parallel=True)
# Uncomment the following line to print out the compilation steps
cc_1.verbose = True
cc_2.verbose = True
def foo(x):
n, m = x.shape
out = np.empty(n)
for i in prange(n):
out[i] = np.mean(x[i])
return out
f1 = cc_1.export('foo', float64[:](float64[:, :]))(foo)
f2 = cc_2.export('foo', float64[:](float64[:, :]))(foo)
cc_1.compile()
cc_2.compile()
import aot_functions1 as aot1
_ = aot1.foo(np.random.rand(10, 10))
import aot_functions2 as aot2
"""
Traceback (most recent call last):
File "/home/auderson/miniconda3/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3444, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-5-ab29907692a6>", line 1, in <module>
import aot_functions2 as aot2
ImportError: /home/auderson/aot_functions2.cpython-38-x86_64-linux-gnu.so: undefined symbol: numba_parallel_for
"""
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (5 by maintainers)
Top Results From Across the Web
Parallel compilation of several numba functions - Stack Overflow
Limitations : 1) AOT compilation only allows for regular functions, not ufuncs . 2) You have to specify function signatures explicitly.
Read more >Compiling code ahead of time - Numba
AOT compilation only allows for regular functions, not ufuncs. You have to specify function signatures explicitly. Each exported function can have only one ......
Read more >Speed Up your Algorithms Part 2— Numba | by Puneet Grover
With Numba, you can speed up all of your calculation focused and computationally heavy python functions(eg loops). It also has support for ...
Read more >Types and signatures - Numba documentation
First-class function support is enabled for all Numba JIT compiled functions and Numba cfunc compiled functions except when: using a non-CPU compiler,. the ......
Read more >How to make Python Faster. Part 2: Numba, Numpy, OpenMP ...
float64_t, ndim=2] ), but they have more features and cleaner syntax. Cython and OpenMP. OpenMP is a Open Multi-Processing API, it supports parallel...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@gmarkall Thank you so much for taking the time to discuss these FR! I’ll follow up in #6382 and #3336 after closing this issue.
@auderson Many thanks for the request - we discussed this in the triage meeting today and there are two conclusions:
nogil
flag for AOT compilation, this should be doable and we can track that in #6382.Are there any separate points to discuss in this issue beyond those in #6382 and #3336? If not, would you be happy with us closing this issue and using the other two to track progress on these items?
Many thanks!