DXGI_ERROR_DEVICE_REMOVED Error
See original GitHub issuehello, i have a problem. I don’t know if anyone has had this problem. I have a Vega8, the drivers are all installed correctly but it is giving the error DXGI_ERROR_DEVICE_REMOVED when I try to run the following script.
import tensorflow.compat.v1 as tf
tf.enable_eager_execution (tf.ConfigProto (log_device_placement = True))
print (tf.add ([1.0, 2.0], [3.0, 4.0]))
I’ve already followed the instructions on the link https://aka.ms/tfdmltimeout but it doesn’t work.
2021-03-31 11: 29: 36.810513: I tensorflow / stream_executor / platform / default / dso_loader.cc: 98] Successfully opened dynamic library C: \ Users \ d.belgd \ Miniconda3 \ envs \ directml2 \ lib \ site-packages \ tensorflow_core \ python / directml.bdb07c797e1e1af1b4a42d21c67ce5494d73991459.dll
2021-03-31 11: 29: 36.917148: I tensorflow / core / common_runtime / dml / dml_device_cache.cc: 126] DirectML device enumeration: found 1 compatible adapters.
[PhysicalDevice (name = '/ physical_device: DML: 0', device_type = 'DML')]
2021-03-31 11: 29: 36.920996: I tensorflow / core / platform / cpu_feature_guard.cc: 142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2021-03-31 11: 29: 36.925428: I tensorflow / core / common_runtime / dml / dml_device_cache.cc: 109] DirectML: creating device on adapter 0 (AMD Radeon (TM) Vega 8 Graphics)
2021-03-31 11: 29: 37.129830: And tensorflow / core / common_runtime / dml / dml_heap_allocator.cc: 53] The DirectML device has encountered an unrecoverable error (DXGI_ERROR_DEVICE_REMOVED). This is most often caused by a timeout occurring on t the GPU. Please visit https://aka.ms/tfdmltimeout for more information and troubleshooting steps.
2021-03-31 11: 29: 37.136448: F tensorflow / core / common_runtime / dml / dml_heap_allocator.cc: 53] HRESULT failed with 0x887a0005: hr
I think this is the problem when I try to run
python detect_video.py --video data/grca-trainmix_1280x720.mp4 --trace --max_frames 10 --headless
WARNING:tensorflow:From detect_video.py:39: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.
W0331 13:51:28.546197 3820 module_wrapper.py:139] From detect_video.py:39: The name tf.keras.backend.get_session is deprecated. Please use tf.compat.v1.keras.backend.get_session instead.
2021-03-31 13:51:28.806023: I tensorflow/stream_executor/platform/default/dso_loader.cc:98] Successfully opened dynamic library C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\tensorflow_core\python/directml.bdb07c797e1af1b4a42d21c67ce5494d73991459.dll
2021-03-31 13:51:28.933164: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:126] DirectML device enumeration: found 1 compatible adapters.
2021-03-31 13:51:28.936741: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2
2021-03-31 13:51:28.940855: I tensorflow/core/common_runtime/dml/dml_device_cache.cc:109] DirectML: creating device on adapter 0 (AMD Radeon(TM) Vega 8 Graphics)
WARNING:tensorflow:From detect_video.py:46: The name tf.RunOptions is deprecated. Please use tf.compat.v1.RunOptions instead.
W0331 13:51:29.155223 3820 module_wrapper.py:139] From detect_video.py:46: The name tf.RunOptions is deprecated. Please use tf.compat.v1.RunOptions instead.
WARNING:tensorflow:From C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
W0331 13:51:29.190702 3820 deprecation.py:506] From C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\tensorflow_core\python\ops\resource_variable_ops.py:1630: calling BaseResourceVariable.__init__ (from tensorflow.python.ops.resource_variable_ops) with constraint is deprecated and will be removed in a future version.
Instructions for updating:
If using Keras pass *_constraint arguments to layers.
Traceback (most recent call last):
File "detect_video.py", line 148, in <module>
app.run(main)
File "C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\absl\app.py", line 303, in run
_run_main(main, args)
File "C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\absl\app.py", line 251, in _run_main
sys.exit(main(argv))
File "detect_video.py", line 65, in main
yolo.load_weights(FLAGS.weights)
File "C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\tensorflow_core\python\keras\engine\training.py", line 182, in load_weights
return super(Model, self).load_weights(filepath, by_name)
File "C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\tensorflow_core\python\keras\engine\network.py", line 1339, in load_weights
pywrap_tensorflow.NewCheckpointReader(filepath)
File "C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 877, in NewCheckpointReader
return CheckpointReader(compat.as_bytes(filepattern))
File "C:\Users\d.belgd\Miniconda3\envs\directml2\lib\site-packages\tensorflow_core\python\pywrap_tensorflow_internal.py", line 889, in __init__
this = _pywrap_tensorflow_internal.new_CheckpointReader(filename)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Unsuccessful TensorSliceReader constructor: Failed to get matching files on ./checkpoints/yolov3.tf: Not found: FindFirstFile failed for: ./checkpoints : The system cannot find the path specified.
; No such process
Issue Analytics
- State:
- Created 2 years ago
- Reactions:1
- Comments:6 (4 by maintainers)
Top Results From Across the Web
How to Fix DXGI_ERROR_DEVICE_REMOVED on Windows ...
It's reported that the DXGI ERROR DEVICE REMOVED error usually occurs when the graphics card runs improperly. In addition, some users find some ......
Read more >DXGI ERROR DEVICE REMOVED | NVIDIA GeForce Forums
If you are using WX you might try completely uninstalling your card via the device manager and letting Windows redetect it and then...
Read more >DXGI ERROR DEVICE REMOVED Error in Windows 10 / 11 Fix
DXGI_ERROR_DEVICE_REMOVED error occurs when the graphics card on your system isn't running properly or there is some connection issue on ...
Read more >DXGI ERROR DEVICE REMOVED - HOW TO FIX IT?
The DXGI ERROR DEVICE REMOVED error may be a Direct X error and is connected to the graphics (video) card. The device will...
Read more >How to Fix DXGI_ERROR_DEVICE_REMOVED on Windows ...
In games like GeForce Experience, shadow play is a hardware acceleration feature useful for screen recording. You can remove the DXGI error by ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
In short: yes, DirectML supports access to dedicated memory!
DirectML itself doesn’t allocate memory for GPU resources: that’s up to the application/framework using it, such as TensorFlow-DirectML (TFDML) in this case. TFDML has a number of allocators for different purposes, but the bulk of the memory (to store the tensors used in GPU calculations) will be backed by subregions of a so-called default heap. Default heaps reflect different memory pools based on the GPU architecture (UMA or NUMA/discrete).
Your Radeon Vega 8 is an integrated GPU, so the 2GB of dedicated memory you see isn’t physical VRAM but rather reserved system memory. In other words, your system actually has 8GB of RAM, but the integrated GPU is claiming 2GB of it for exclusive access. This blog explains some of the differences between dedicated and shared memory, how they are reported in task manager, and some differences between discrete and integrated GPUs in this respect.
Integrated GPUs are, unfortunately, not going to be particularly fast in machine learning. It’s worth pointing out that we haven’t really optimized TFDML for integrated GPUs (e.g. we could avoid some memory copies since default-heap resources will always live in the “L0” memory pool); however, it’s unlikely that you’ll see huge performance gains over the CPU without using a more powerful discrete GPU.
@jstoecker and @adtsai really with the memory allocation it worked, now one thing I saw, was that detect-video.py is using shared memory and not dedicated memory. Do you know that directml supports access to dedicated memory? I ask this because the detection of the objects is very slow