[BUG] Illegal memory access when running data convert for criteo example
See original GitHub issueDescribe the bug
When running optimize_criteo.ipynb
, I’ve encountered the following error. In addition, the program sometimes hangs with 100% GPU usage…
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
<ipython-input-1-2af48634df31> in <module>
47 del gdf
48 path_out = '/data/criteo/parquet/'
---> 49 file_to_pq(train_set, 'csv', output_folder=path_out, cols=cols, dtypes=dtypes)
<ipython-input-1-2af48634df31> in file_to_pq(target_files, file_type, output_folder, cols, dtypes)
43 if file_path != old_file_path:
44 writer = ParquetWriter(path)
---> 45 writer.write_table(gdf)
46 old_file_path = file_path
47 del gdf
cudf/_lib/parquet.pyx in cudf._lib.parquet.ParquetWriter.write_table()
RuntimeError: CUDA error encountered at: /cudf/cpp/src/io/parquet/writer_impl.cu:341: 700 cudaErrorIllegalAddress an illegal memory access was encountered
Steps/Code to reproduce bug
- Download and decompress Criteo dataset (e.g.,
day_0.gz
->day_0
) - Launch jupyter
- Run
optimize_criteo.ipynb
Expected behavior
No error and hanging-up.
Environment details (please complete the following information):
- Environment location: Docker
- Method of NVTabular install: Docker
- NGC’s container image: hash is
1567a4251e7f
. - Launch command:
sudo docker run --gpus=all --rm -it -v $(pwd):/ws -v /path/to/data/:/data -p 8888:8888 -p 8797:8787 -p 8796:8786 --ipc=host --cap-add SYS_PTRACE nvcr.io/nvidia/nvtabular:0.1 /bin/bash
- NGC’s container image: hash is
- Other envs
- Host OS: Ubuntu 18.04.4
- GPU: TITAN X (Pascal)
- Driver version: 440.64.00
- Docker version:
Docker version 19.03.8, build afacb8b7f0
Additional context
If needed, I can get more information by using debugging tools. Please let me know how I should do. Or, should I file this issue on cudf
repo?
Issue Analytics
- State:
- Created 3 years ago
- Comments:10 (3 by maintainers)
Top Results From Across the Web
[BUG] Illegal memory access when running data convert for ...
I've applied cuda-memcheck to this example code. In the case of initcheck and racecheck , too many error and warning messages are displayed....
Read more >an illegal memory access was encountered
I have search this question but I feel no useful information for me. The following is the code where it got error information:...
Read more >Weird CUDA illegal memory access error
Hi all, I encountered a weird CUDA illegal memory access error. Will try to have a minimal example in a while.
Read more >CKR - River Thames Conditions
Tweedle bugs diaper liners, Transitional words and phrases song, Easy membership site software, S-pen calibration, Types coal pulverizers, San miguel vs ...
Read more >ThinkSystem RAID 930-x xGB Flash PCIe 12Gb ...
(DCSG00010317) -FW crashed with Exception handler error at bios post while ... (DCSG00043119) -Aero A0: Write and DC issues on NVMe drives running...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
@OlivierNV Did you mean the command should be changed from, for example,
cuda-memcheck --tool racecheck python test.py
tocuda-memcheck python test.py
? If so, I got no error fromcuda-memcheck
like below. Unfortunately, withoutcuda-memcheck
, the error happened again…Full console outputs are described below.
stdout
Note that I added several progress logs.
stderr
Let’s go ahead and file an issue with cudf team. Unfortunately have not been able to repro due to resource limitations.