Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Cudasim acting differently than Cuda (when allocating)

See original GitHub issue

Hello!

I had a bug which i finally found but when debugging i found that cudasim was acting differently (had no bug) than cuda (had bug): It was like:

@cuda.jit(device=True)
def f(n):
    a = cuda.local.array(n, int32)
    for i in range(n): a[i] = i
    return a

@cuda.jit
def kernel(in, out):
    a = f(n)
    # doing something with a

With cuda, “a” seems to be discarded, but with cudasim it has its values which made debugging quite hard. Maybe in cudasim mode a could also be discarded?

I have another question: How do you allocate global memory in a cuda kernel? I only found giving the function an array which is allocated on CPU site, but I cant find a pendant to cuda.local.array?

And another suggestion: In cudasim you can use print(), in cuda it throws an error. I think it would be convenient to simply ignore print() in pure cuda mode, because then there is no need to comment the print statements when switching cuda and cudasim mode.

Thanks!

Issue Analytics

State:
Created 3 years ago
Comments:10 (5 by maintainers)

Top GitHub Comments

1reaction

mha-pycommented, Aug 12, 2020

Here the output:

System info:
--------------------------------------------------------------------------------
__Time Stamp__
Report started (local time)                   : 2020-08-12 12:42:17.015212
UTC start time                                : 2020-08-12 10:42:17.015212
Running time (s)                              : 1.827336

__Hardware Information__
Machine                                       : AMD64
CPU Name                                      : ivybridge
CPU Count                                     : 8
Number of accessible CPUs                     : 8
List of accessible CPUs cores                 : 0 1 2 3 4 5 6 7
CFS Restrictions (CPUs worth of runtime)      : None

CPU Features                                  : 64bit aes avx cmov cx16 cx8 f16c
                                                fsgsbase fxsr mmx pclmul popcnt
                                                rdrnd sahf sse sse2 sse3 sse4.1
                                                sse4.2 ssse3 xsave xsaveopt

Memory Total (MB)                             : 8153
Memory Available (MB)                         : 2414

__OS Information__
Platform Name                                 : Windows-10-10.0.19041-SP0
Platform Release                              : 10
OS Name                                       : Windows
OS Version                                    : 10.0.19041
OS Specific Version                           : 10 10.0.19041 SP0 Multiprocessor Free
Libc Version                                  : ?

__Python Information__
Python Compiler                               : MSC v.1900 64 bit (AMD64)
Python Implementation                         : CPython
Python Version                                : 3.6.6
Python Locale                                 : de_DE.cp1252

__LLVM Information__
LLVM Version                                  : 9.0.1

__CUDA Information__
CUDA Device Initialized                       : True
CUDA Driver Version                           : 11000
CUDA Detect Output:
Found 1 CUDA devices
id 0     b'GeForce GTX 1070'                              [SUPPORTED]
                      compute capability: 6.1
                           pci device id: 0
                              pci bus id: 1
Summary:
	1/1 devices are supported

CUDA Librairies Test Output:
Finding cublas from CUDA_HOME
	named  cublas.dll
	trying to open library...	ERROR: failed to open cublas:
[WinError 126] Das angegebene Modul wurde nicht gefunden
Finding cusparse from CUDA_HOME
	named  cusparse.dll
	trying to open library...	ERROR: failed to open cusparse:
[WinError 126] Das angegebene Modul wurde nicht gefunden
Finding cufft from CUDA_HOME
	named  cufft.dll
	trying to open library...	ERROR: failed to open cufft:
[WinError 126] Das angegebene Modul wurde nicht gefunden
Finding curand from CUDA_HOME
	named  curand.dll
	trying to open library...	ERROR: failed to open curand:
[WinError 126] Das angegebene Modul wurde nicht gefunden
Finding nvvm from CUDA_HOME
	named  nvvm64_33_0.dll
	trying to open library...	ok
Finding libdevice from CUDA_HOME
	searching for compute_20...	ok
	searching for compute_30...	ok
	searching for compute_35...	ok
	searching for compute_50...	ok


__ROC information__
ROC Available                                 : False
ROC Toolchains                                : None
HSA Agents Count                              : 0
HSA Agents:
None
HSA Discrete GPUs Count                       : 0
HSA Discrete GPUs                             : None

__SVML Information__
SVML State, config.USING_SVML                 : False
SVML Library Loaded                           : False
llvmlite Using SVML Patched LLVM              : True
SVML Operational                              : False

__Threading Layer Information__
TBB Threading Layer Available                 : False
+--> Disabled due to Unknown import problem.
OpenMP Threading Layer Available              : False
+--> Disabled due to Unknown import problem.
Workqueue Threading Layer Available           : True
+-->Workqueue imported successfully.

__Numba Environment Variable Information__
None found.

__Conda Information__
Conda not available.

__Installed Packages__
Package                           Version
--------------------------------- -------------
-rotobuf                          3.12.2
-umba                             0.39.0
absl-py                           0.9.0
adodbapi                          2.6.0.7
aiohttp                           3.6.2
alabaster                         0.7.11
alembic                           1.2.1
algopy                            0.5.7
altair                            2.1.0
altair-widgets                    0.1.2
altgraph                          0.16.1
appdirs                           1.4.3
asciitree                         0.3.3
asteval                           0.9.12
astor                             0.7.1
astroid                           1.6.5
astroML                           0.3
astunparse                        1.6.3
async-timeout                     3.0.1
atari-py                          0.2.6
atomicwrites                      1.1.5
attrs                             18.1.0
auto-py-to-exe                    2.6.5
autobahn                          20.7.1
Babel                             2.6.0
backcall                          0.1.0
backports-abc                     0.5
baresql                           0.7.4
bcolz                             1.2.1
beautifulsoup4                    4.6.0
bleach                            1.5.0
blosc                             1.5.1
bloscpack                         0.13.0
bokeh                             2.1.1
bottle                            0.12.17
bottle-websocket                  0.2.9
Bottleneck                        1.2.1
bqplot                            0.10.5
branca                            0.4.1
brewer2mpl                        1.4.1
Brotli                            1.0.4
bsddb3                            6.2.6
cachetools                        4.1.1
certifi                           2020.6.20
cffi                              1.11.5
cftime                            1.0.0
chardet                           3.0.4
chart-studio                      1.0.0
click                             7.1.2
click-plugins                     1.1.1
cligj                             0.5.0
cloudpickle                       1.2.1
clrmagic                          0.0.1a2
colorama                          0.3.9
colorcet                          0.9.1
coloredlogs                       14.0
comtypes                          1.1.4
configobj                         5.0.6
cryptography                      3.0
cvxopt                            1.2.0
cvxpy                             1.0.6
cx-Freeze                         5.1.1
cycler                            0.10.0
cyordereddict                     1.0.0
Cython                            0.29.20
cytoolz                           0.9.0.1
dask                              0.18.1
dask-searchcv                     0.2.0
datashader                        0.6.7
datashape                         0.5.2
db.py                             0.5.3
decorator                         4.3.0
defusedxml                        0.6.0
dill                              0.3.2
distributed                       1.22.0
docopt                            0.6.2
docrepr                           0.1.1
docutils                          0.14
ecos                              2.0.5
edward                            1.3.5
Eel                               0.10.4
emcee                             2.2.1
entrypoints                       0.2.3
fast-histogram                    0.4
fastcache                         1.0.2
fasteners                         0.14.1
fastparquet                       0.1.5
feather-format                    0.4.0
filelock                          3.0.12
Flask                             1.0.2
flickrapi                         2.4.0
folium                            0.11.0
formlayout                        1.1.0
future                            0.16.0
fuzzywuzzy                        0.16.0
gast                              0.2.2
geographiclib                     1.49
geopy                             1.14.0
get                               2019.4.13
gevent                            1.4.0
gevent-websocket                  0.10.1
gmpy2                             2.0.8
google                            2.0.3
google-auth                       1.20.0
google-auth-oauthlib              0.4.1
google-pasta                      0.2.0
graphviz                          0.14
greenlet                          0.4.15
grpcio                            1.30.0
guidata                           1.7.6
guiqwt                            3.0.3
Gutenberg                         0.8.0
gym                               0.14.0
h5py                              2.10.0
HeapDict                          1.0.0
holoviews                         1.13.3
html5lib                          0.9999999
humanfriendly                     8.2
husl                              4.0.3
hvplot                            0.2.0
idna                              2.10
idna-ssl                          1.1.0
imageio                           2.3.0
imagesize                         1.0.0
importlib-metadata                1.7.0
intake                            0.1.3
ipydatawidgets                    3.1.0
ipykernel                         4.8.2
ipyleaflet                        0.9.0
ipympl                            0.2.0
ipyparallel                       6.2.2
ipyscales                         0.2.2
ipython                           6.4.0
ipython-genutils                  0.2.0
ipython-sql                       0.3.9
ipywidgets                        7.2.1
isodate                           0.6.0
isort                             4.3.4
iterative-stratification          0.1.6
itsdangerous                      0.24
jedi                              0.12.1
Jinja2                            2.10
joblib                            0.12.0
jsonschema                        2.6.0
julia                             0.1.5
jupyter                           1.0.0
jupyter-client                    5.2.3
jupyter-console                   5.2.0
jupyter-contrib-core              0.3.3
jupyter-contrib-nbextensions      0.5.0
jupyter-core                      4.4.0
jupyter-highlight-selected-word   0.2.0
jupyter-latex-envs                1.4.6
jupyter-nbextensions-configurator 0.4.0
jupyter-server-proxy              1.5.0
jupyter-sphinx                    0.1.2
jupyterlab                        0.32.1
jupyterlab-launcher               0.10.5
Keras                             2.3.0
Keras-Applications                1.0.8
keras-contrib                     2.0.8
Keras-Preprocessing               1.1.2
keras-vis                         0.5.0
keyboard                          0.13.4
keyring                           13.2.1
kiwisolver                        1.0.1
lazy-object-proxy                 1.3.1
llvmlite                          0.33.0
lmfit                             0.9.11
locket                            0.2.0
loky                              2.1.4
lxml                              4.2.3
Mako                              1.1.0
Markdown                          3.2.2
MarkupSafe                        1.0
matplotlib                        3.1.1
mccabe                            0.6.1
metakernel                        0.20.14
mistune                           0.8.3
mizani                            0.4.6
mkl-service                       1.1.2
monotonic                         1.5
more-itertools                    4.2.0
moviepy                           0.2.3.5
mpl-scatter-density               0.3
mpld3                             0.3
mpldatacursor                     0.6.2
mpmath                            1.0.0
msgpack                           1.0.0
msgpack-numpy                     0.4.3
msgpack-python                    0.5.4+dummy
multidict                         4.7.6
multipledispatch                  0.5.0
multiprocess                      0.70.10
multitasking                      0.0.7
munch                             2.5.0
mypy                              0.610
mysql-connector-python            8.0.6
nbconvert                         5.4.1
nbconvert-reportlab               0.2
nbformat                          4.4.0
neat-python                       0.92
netCDF4                           1.4.0
networkx                          2.1
nltk                              3.3
notebook                          5.7.2
numba                             0.50.1
numcodecs                         0.5.5
numdifftools                      0.9.20
numexpr                           2.6.5
numpy                             1.18.5
numpydoc                          0.8.0
oauthlib                          3.1.0
oct2py                            4.0.6
octave-kernel                     0.28.4
opencv-contrib-python             3.4.4.19
opencv-python                     3.4.4.19
opt-einsum                        3.3.0
osqp                              0.3.0
packaging                         17.1
palettable                        3.1.1
pandas                            0.25.1
pandas-datareader                 0.6.0
pandocfilters                     1.4.2
panel                             0.9.7
param                             1.9.3
parambokeh                        0.2.2
paramnb                           2.0.2
parso                             0.3.0
partd                             0.3.8
patsy                             0.5.0
pdfrw                             0.4
pdvega                            0.1
pefile                            2019.4.18
pep8                              1.7.1
pexpect                           4.6.0+dummy
pg8000                            1.11.0
pickleshare                       0.7.4
Pillow                            5.2.0
pip                               20.1.1
pkginfo                           1.4.2
plotly                            4.1.1
plotnine                          0.3.0
pluggy                            0.6.0
post                              2019.4.13
prettytable                       0.7.2
prometheus-client                 0.5.0
prompt-toolkit                    1.0.15
protobuf                          3.12.4
psutil                            5.4.6
ptpython                          0.41
public                            2019.4.13
PuLP                              1.6.8
py                                1.5.4
py-spy                            0.3.3
pyarrow                           0.9.0
pyasn1                            0.4.8
pyasn1-modules                    0.2.8
PyAudio                           0.2.11
pybars3                           0.9.3
pybind11                          2.2.3
pycodestyle                       2.4.0
pycparser                         2.17
pyct                              0.4.6
pyculib                           1.0.1
pydot                             1.2.4
pyflakes                          2.0.0
pyflux                            0.4.17
pygame                            1.9.3
pyglet                            1.2.4
Pygments                          2.2.0
PyInstaller                       3.5
pylint                            1.9.2
pymc                              2.3.7
pymc3                             3.4.1
PyMeta3                           0.5.1
pymongo                           3.7.0
pymp-pypi                         0.4.2
pyodbc                            4.0.23
PyOpenGL                          3.1.5
pypandoc                          1.3.2
pyparsing                         2.2.0
PyQt5                             5.9.2
pyqtgraph                         0.11.0.dev0
pyreadline                        2.1
pyserial                          3.4
pystache                          0.5.4
pytesseract                       0.3.4
pytest                            3.6.3
python-chess                      0.31.2
python-dateutil                   2.7.3
python-editor                     1.0.4
python-hdf4                       0.9.1
python-Levenshtein                0.12.0
python-snappy                     0.5.3
python-unsplash                   1.0.1
pythonnet                         2.4.0.dev0
pythonping                        1.0.8
PythonQwt                         0.5.5
pythreejs                         1.1.0
pytools                           2020.3.1
pytube                            9.5.2
pytz                              2018.5
pyviz-comms                       0.7.5
PyWavelets                        0.5.2
pywin32                           223.1
pywin32-ctypes                    0.2.0
pywinpty                          0.5.4
pywinusb                          0.4.2
PyYAML                            4.1
pyzmq                             19.0.1
pyzo                              4.5.2
QtAwesome                         0.5.0.dev0
qtconsole                         4.3.1
QtPy                              1.4.2
query-string                      2019.4.13
ray                               0.8.6
rdflib                            4.2.2
rdflib-sqlalchemy                 0.3.8
redis                             3.4.1
regex                             2018.6.21
reportlab                         3.4.0
request                           2019.4.13
requests                          2.24.0
requests-file                     1.4.3
requests-ftp                      0.3.1
requests-oauthlib                 1.3.0
requests-toolbelt                 0.8.0
retrying                          1.3.3
rope                              0.10.7
rpy2                              2.9.4
rsa                               4.6
Rtree                             0.9.4
ruamel.yaml                       0.15.42
Rx                                1.6.1
scikit-fuzzy                      0.3.1
scikit-image                      0.14.0
scikit-learn                      0.19.1
scikit-optimize                   0.5.2
scilab2py                         0.6.1
scipy                             1.4.1
scs                               1.2.7
seaborn                           0.9.0
segmentation-models               0.1.2
Send2Trash                        1.5.0
setuptools                        49.2.1
Shapely                           1.7.0
simpervisor                       0.3
simplegeneric                     0.8.1
simplejson                        3.16.0
sip                               4.19.6
six                               1.15.0
snakeviz                          0.4.2
snowballstemmer                   1.2.1
sortedcontainers                  2.0.4
sounddevice                       0.3.11
SPARQLWrapper                     1.8.4
Sphinx                            1.7.5
sphinx-rtd-theme                  0.4.0
sphinxcontrib-websupport          1.1.0
spyder                            3.3.0
spyder-kernels                    0.2.4
SQLAlchemy                        1.2.9
sqlite-bro                        0.8.11
sqlparse                          0.2.4
statsmodels                       0.9.0
streamz                           0.3.0
supersmoother                     0.4
sympy                             1.1.1
tables                            3.4.4
tbb                               2019.0
tblib                             1.3.2
tensorboard                       1.15.0
tensorboard-plugin-wit            1.7.0
tensorflow-estimator              1.15.1
tensorflow-gpu                    1.15.0
tensorflow-gpu-estimator          2.3.0
termcolor                         1.1.0
terminado                         0.8.1
testpath                          0.3.1
Theano                            1.0.2
thrift                            0.11.0
toolz                             0.9.0
torch                             0.4.0
torchvision                       0.2.1
tornado                           5.1.1
tqdm                              4.23.4
traitlets                         4.3.2
traittypes                        0.2.1
twine                             1.11.0
twitter                           1.17.1
txaio                             20.4.1
typed-ast                         1.1.0
typing                            3.6.4
typing-extensions                 3.7.4.2
uncertainties                     3.0.2
urllib3                           1.25.10
vega                              1.3.0
vega-datasets                     0.5.0
vega3                             0.13.0
ViTables                          3.0.0
vpython                           7.6.1
wcwidth                           0.1.7
webencodings                      0.5.1
Werkzeug                          1.0.1
wheel                             0.34.2
whichcraft                        0.6.0
widgetsnbextension                3.2.1
winpython                         1.10.20180624
wordcloud                         1.4.1
wrapt                             1.12.1
xarray                            0.10.7
xlrd                              1.1.0
XlsxWriter                        1.0.5
xlwings                           0.11.5
yarl                              1.4.2
zarr                              2.2.0
zict                              0.1.3
zipp                              3.1.0

No errors reported.


__Warning log__
Warning (roc): Error initialising ROC: No ROC toolchains found.
Warning (roc): No HSA Agents found, encountered exception when searching: Error at driver init: 

HSA is not currently supported on this platform (win32).
:
Warning: Conda not available.
 Error was [WinError 2] Das System kann die angegebene Datei nicht finden

--------------------------------------------------------------------------------
If requested, please copy and paste the information between
the dashed (----) lines, or from a given specific section as
appropriate.

=============================================================
IMPORTANT: Please ensure that you are happy with sharing the
contents of the information present, any information that you
wish to keep private you should remove before sharing.
=============================================================

0reactions

gmarkallcommented, Jul 2, 2021

Just checked this with Numba 0.54 RC:

Variation 1 behaves similarly across the hardware and simulator
Variation 2 fails to compile

I also noticed that this is returning a local array from a function, which isn’t expected to work (see also discussion in #7090), so I’m going to close this.

Top Results From Across the Web

How CUDA constant memory allocation works? - Stack Overflow

I'd like to get some insight about how constant memory is allocated (using CUDA 4.2). I know that the total available constant memory...

CUDA C++ Best Practices Guide

Because separate registers are allocated to all active threads, no swapping of registers or other state need occur when switching among GPU threads....

Debugging CUDA Python with the the CUDA Simulator - Numba

This can be used to debug CUDA Python code, either by adding print statements ... in the case where divergent threads enter different...

“CUDA Tutorial” - Jonathan Hui blog

We also allocate space to copy result from the device to the host ... CUDA threads have access to multiple memory spaces with...

Shared Memory Atomics and Dynamic Allocation in CUDA

In this video we write a histogram kernel from scratch that uses shared memory atomics with dynamically allocated shared memory!