question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

[BUG] Illegal memory access when running data convert for criteo example

See original GitHub issue

Describe the bug

When running optimize_criteo.ipynb, I’ve encountered the following error. In addition, the program sometimes hangs with 100% GPU usage…

---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
<ipython-input-1-2af48634df31> in <module>
     47         del gdf
     48 path_out = '/data/criteo/parquet/'
---> 49 file_to_pq(train_set, 'csv', output_folder=path_out, cols=cols, dtypes=dtypes)

<ipython-input-1-2af48634df31> in file_to_pq(target_files, file_type, output_folder, cols, dtypes)
     43         if file_path != old_file_path:
     44             writer = ParquetWriter(path)
---> 45         writer.write_table(gdf)
     46         old_file_path = file_path
     47         del gdf

cudf/_lib/parquet.pyx in cudf._lib.parquet.ParquetWriter.write_table()

RuntimeError: CUDA error encountered at: /cudf/cpp/src/io/parquet/writer_impl.cu:341: 700 cudaErrorIllegalAddress an illegal memory access was encountered

Steps/Code to reproduce bug

  • Download and decompress Criteo dataset (e.g., day_0.gz -> day_0)
  • Launch jupyter
  • Run optimize_criteo.ipynb

Expected behavior

No error and hanging-up.

Environment details (please complete the following information):

  • Environment location: Docker
  • Method of NVTabular install: Docker
    • NGC’s container image: hash is 1567a4251e7f.
    • Launch command: sudo docker run --gpus=all --rm -it -v $(pwd):/ws -v /path/to/data/:/data -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.1 /bin/bash
  • Other envs
    • Host OS: Ubuntu 18.04.4
    • GPU: TITAN X (Pascal)
    • Driver version: 440.64.00
    • Docker version: Docker version 19.03.8, build afacb8b7f0

Additional context

If needed, I can get more information by using debugging tools. Please let me know how I should do. Or, should I file this issue on cudf repo?

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:10 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
lazykyamacommented, Jul 2, 2020

@OlivierNV Did you mean the command should be changed from, for example, cuda-memcheck --tool racecheck python test.py to cuda-memcheck python test.py? If so, I got no error from cuda-memcheck like below. Unfortunately, without cuda-memcheck, the error happened again…

========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors

Full console outputs are described below.

stdout

Note that I added several progress logs.

targets: ['/data/criteo/org/day_0', '/data/criteo/org/day_1']
[0]: /data/criteo/parquet/day_0
         2020-07-01 23:14:23.470198 duration=4.284763051196933[sec.]
[1]: /data/criteo/parquet/day_0
         2020-07-01 23:14:50.606910 duration=4.158812602981925[sec.]
[2]: /data/criteo/parquet/day_0
         2020-07-01 23:15:17.779998 duration=4.287690730765462[sec.]
[3]: /data/criteo/parquet/day_0
         2020-07-01 23:15:45.152566 duration=4.260857192799449[sec.]
[4]: /data/criteo/parquet/day_0
... (similar logs repeated) ...
[164]: /data/criteo/parquet/day_1
         2020-07-02 00:35:11.787064 duration=4.206702688708901[sec.]
[165]: /data/criteo/parquet/day_1
         2020-07-02 00:35:28.438073 duration=1.9731408972293139[sec.]
========= CUDA-MEMCHECK
========= ERROR SUMMARY: 0 errors

stderr

/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_NVVM=/usr/local/cuda/nvvm/lib64/libnvvm.so.

For more information about alternatives visit: ('http://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
  warnings.warn(errors.NumbaWarning(msg))
/conda/envs/rapids/lib/python3.7/site-packages/numba/cuda/envvars.py:17: NumbaWarning:
Environment variables with the 'NUMBAPRO' prefix are deprecated and consequently ignored, found use of NUMBAPRO_LIBDEVICE=/usr/local/cuda/nvvm/libdevice/.

For more information about alternatives visit: ('http://numba.pydata.org/numba-doc/latest/cuda/overview.html', '#cudatoolkit-lookup')
  warnings.warn(errors.NumbaWarning(msg))
1reaction
jperez999commented, Jun 26, 2020

Let’s go ahead and file an issue with cudf team. Unfortunately have not been able to repro due to resource limitations.

Read more comments on GitHub >

github_iconTop Results From Across the Web

[BUG] Illegal memory access when running data convert for ...
I've applied cuda-memcheck to this example code. In the case of initcheck and racecheck , too many error and warning messages are displayed....
Read more >
an illegal memory access was encountered
I have search this question but I feel no useful information for me. The following is the code where it got error information:...
Read more >
Weird CUDA illegal memory access error
Hi all, I encountered a weird CUDA illegal memory access error. Will try to have a minimal example in a while.
Read more >
CKR - River Thames Conditions
Tweedle bugs diaper liners, Transitional words and phrases song, Easy membership site software, S-pen calibration, Types coal pulverizers, San miguel vs ...
Read more >
ThinkSystem RAID 930-x xGB Flash PCIe 12Gb ...
(DCSG00010317) -FW crashed with Exception handler error at bios post while ... (DCSG00043119) -Aero A0: Write and DC issues on NVMe drives running...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found