Couldn't get temp CUBIN file name - TensorFlow XLA
See original GitHub issueHi everyone, I am really struggling finding a solution for this problem. It happens when I run the server with TensorFlow Model using the GPUs, but I get this error (Full log):
=============================
== Triton Inference Server ==
=============================
NVIDIA Release 22.05 (build 38317651)
Triton Server Version 2.22.0
Copyright (c) 2018-2022, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
Various files include modifications (c) NVIDIA CORPORATION & AFFILIATES. All rights reserved.
This container image and its contents are governed by the NVIDIA Deep Learning Container License.
By pulling and using the container, you accept the terms and conditions of this license:
https://developer.nvidia.com/ngc/nvidia-deep-learning-container-license
WARNING: CUDA Minor Version Compatibility mode ENABLED.
Using driver version 510.47.03 which has support for CUDA 11.6. This container
was built with CUDA 11.7 and will be run in Minor Version Compatibility mode.
CUDA Forward Compatibility is preferred over Minor Version Compatibility for use
with this container but was unavailable:
[[Forward compatibility was attempted on non supported HW (CUDA_ERROR_COMPAT_NOT_SUPPORTED_ON_DEVICE) cuInit()=804]]
See https://docs.nvidia.com/deploy/cuda-compatibility/ for details.
I0804 13:52:20.517783 1 pinned_memory_manager.cc:240] Pinned memory pool is created at '0x7fa64e000000' with size 268435456
I0804 13:52:20.517783 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 0 with size 67108864
I0804 13:52:20.517783 1 cuda_memory_manager.cc:105] CUDA memory pool is created on device 1 with size 67108864
W0804 13:52:20.973811 1 server.cc:213] failed to enable peer access for some device pairs
I0804 13:52:20.973811 1 model_config_utils.cc:645] Server side auto-completed config: name: "tf_model"
platform: "tensorflow_savedmodel"
version_policy {
latest {
num_versions: 1
}
}
max_batch_size: 128
input {
name: "inputs"
data_type: TYPE_UINT8
format: FORMAT_NHWC
dims: 300
dims: 300
dims: 3
}
output {
name: "detection_boxes"
data_type: TYPE_FP32
dims: 100
dims: 4
}
output {
name: "detection_classes"
data_type: TYPE_FP32
dims: 100
label_filename: "label_map.pbtxt"
}
output {
name: "detection_scores"
data_type: TYPE_FP32
dims: 100
}
instance_group {
count: 1
}
default_model_filename: "model.savedmodel"
dynamic_batching {
preferred_batch_size: 1
preferred_batch_size: 2
preferred_batch_size: 4
preferred_batch_size: 8
preferred_batch_size: 16
preferred_batch_size: 32
preferred_batch_size: 64
preferred_batch_size: 128
max_queue_delay_microseconds: 30000
preserve_ordering: true
}
optimization {
graph {
level: 1
}
}
model_warmup {
name: "warmup_1"
batch_size: 1
inputs {
key: "inputs"
value {
data_type: TYPE_UINT8
dims: 300
dims: 300
dims: 3
zero_data: true
}
}
}
backend: "tensorflow"
response_cache {
}
I0804 13:52:20.977811 1 model_repository_manager.cc:1191] loading: tf_model:1
I0804 13:52:21.077818 1 backend_model.cc:292] Adding default backend config setting: default-max-batch-size,4
I0804 13:52:21.077818 1 shared_library.cc:108] OpenLibraryHandle: /opt/tritonserver/backends/tensorflow2/libtriton_tensorflow2.so
I0804 13:52:21.769861 1 tensorflow.cc:2181] TRITONBACKEND_Initialize: tensorflow
I0804 13:52:21.769861 1 tensorflow.cc:2191] Triton TRITONBACKEND API version: 1.9
I0804 13:52:21.769861 1 tensorflow.cc:2197] 'tensorflow' TRITONBACKEND API version: 1.9
I0804 13:52:21.769861 1 tensorflow.cc:2221] backend configuration:
{"cmdline":{"auto-complete-config":"false","backend-directory":"/opt/tritonserver/backends","min-compute-capability":"6.000000","version":"2","default-max-batch-size":"4"}}
I0804 13:52:21.769861 1 tensorflow.cc:2281] TRITONBACKEND_ModelInitialize: tf_model (version 1)
I0804 13:52:21.769861 1 model_config_utils.cc:1597] ModelConfig 64-bit fields:
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::default_queue_policy::default_timeout_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::max_queue_delay_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::dynamic_batching::priority_queue_policy::value::default_timeout_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::ensemble_scheduling::step::model_version
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::input::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::input::reshape::shape
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::instance_group::secondary_devices::device_id
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::model_warmup::inputs::value::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::optimization::cuda::graph_spec::graph_lower_bound::input::value::dim
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::optimization::cuda::graph_spec::input::value::dim
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::output::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::output::reshape::shape
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::direct::max_queue_delay_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::max_sequence_idle_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::oldest::max_queue_delay_microseconds
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::state::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::sequence_batching::state::initial_state::dims
I0804 13:52:21.769861 1 model_config_utils.cc:1599] ModelConfig::version_policy::specific::versions
I0804 13:52:21.769861 1 tensorflow.cc:1437] model configuration:
{
"name": "tf_model",
"platform": "tensorflow_savedmodel",
"backend": "tensorflow",
"version_policy": {
"latest": {
"num_versions": 1
}
},
"max_batch_size": 128,
"input": [
{
"name": "inputs",
"data_type": "TYPE_UINT8",
"format": "FORMAT_NHWC",
"dims": [
300,
300,
3
],
"is_shape_tensor": false,
"allow_ragged_batch": false,
"optional": false
}
],
"output": [
{
"name": "detection_boxes",
"data_type": "TYPE_FP32",
"dims": [
100,
4
],
"label_filename": "",
"is_shape_tensor": false
},
{
"name": "detection_classes",
"data_type": "TYPE_FP32",
"dims": [
100
],
"label_filename": "label_map.pbtxt",
"is_shape_tensor": false
},
{
"name": "detection_scores",
"data_type": "TYPE_FP32",
"dims": [
100
],
"label_filename": "",
"is_shape_tensor": false
}
],
"batch_input": [],
"batch_output": [],
"optimization": {
"graph": {
"level": 1
},
"priority": "PRIORITY_DEFAULT",
"input_pinned_memory": {
"enable": true
},
"output_pinned_memory": {
"enable": true
},
"gather_kernel_buffer_threshold": 0,
"eager_batching": false
},
"dynamic_batching": {
"preferred_batch_size": [
1,
2,
4,
8,
16,
32,
64,
128
],
"max_queue_delay_microseconds": 30000,
"preserve_ordering": true,
"priority_levels": 0,
"default_priority_level": 0,
"priority_queue_policy": {}
},
"instance_group": [
{
"name": "tf_model_0",
"kind": "KIND_GPU",
"count": 1,
"gpus": [
0,
1
],
"secondary_devices": [],
"profile": [],
"passive": false,
"host_policy": ""
}
],
"default_model_filename": "model.savedmodel",
"cc_model_filenames": {},
"metric_tags": {},
"parameters": {},
"model_warmup": [
{
"name": "warmup_1",
"batch_size": 1,
"inputs": {
"inputs": {
"data_type": "TYPE_UINT8",
"dims": [
300,
300,
3
],
"zero_data": true
}
}
}
],
"response_cache": {
"enable": false
}
}
I0804 13:52:21.773861 1 tensorflow.cc:2330] TRITONBACKEND_ModelInstanceInitialize: tf_model_0 (GPU device 0)
I0804 13:52:21.773861 1 backend_model_instance.cc:105] Creating instance tf_model_0 on GPU 0 (7.5) using artifact 'model.savedmodel'
2022-08-04 13:52:23.273955: I tensorflow/cc/saved_model/reader.cc:43] Reading SavedModel from: /models/tf_model/1/model.savedmodel
2022-08-04 13:52:23.609976: I tensorflow/cc/saved_model/reader.cc:78] Reading meta graph with tags { serve }
2022-08-04 13:52:23.609976: I tensorflow/cc/saved_model/reader.cc:119] Reading SavedModel debug info (if present) from: /models/tf_model/1/model.savedmodel
2022-08-04 13:52:23.609976: I tensorflow/core/platform/cpu_feature_guard.cc:152] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN) to use the following CPU instructions in performance-critical operations: SSE3 SSE4.1 SSE4.2 AVX
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2022-08-04 13:52:23.613976: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.613976: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.709982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.709982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.709982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:23.713982: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.342084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.342084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.342084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.342084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.346084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.346084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:0 with 1422 MB memory: -> device: 0, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:03:00.0, compute capability: 7.5
2022-08-04 13:52:25.346084: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:952] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2022-08-04 13:52:25.346084: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1525] Created device /job:localhost/replica:0/task:0/device:GPU:1 with 2762 MB memory: -> device: 1, name: NVIDIA GeForce RTX 2080 Ti, pci bus id: 0000:82:00.0, compute capability: 7.5
2022-08-04 13:52:26.422152: I tensorflow/cc/saved_model/loader.cc:230] Restoring SavedModel bundle.
2022-08-04 13:52:30.394400: I tensorflow/compiler/xla/service/service.cc:171] XLA service 0x7fa4cc031100 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2022-08-04 13:52:30.394400: I tensorflow/compiler/xla/service/service.cc:179] StreamExecutor device (0): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2022-08-04 13:52:30.394400: I tensorflow/compiler/xla/service/service.cc:179] StreamExecutor device (1): NVIDIA GeForce RTX 2080 Ti, Compute Capability 7.5
2022-08-04 13:52:30.770423: I tensorflow/compiler/jit/xla_compilation_cache.cc:402] Compiled cluster using XLA! This line is logged at most once for the lifetime of the process.
2022-08-04 13:52:30.790425: I tensorflow/cc/saved_model/loader.cc:214] Running initialization op on SavedModel bundle at path: /models/tf_model/1/model.savedmodel
2022-08-04 13:52:33.562598: I tensorflow/cc/saved_model/loader.cc:321] SavedModel load for tags { serve }; Status: success: OK. Took 10288643 microseconds.
I0804 13:52:33.566598 1 backend_model_instance.cc:347] Generating warmup sample data for 'warmup_1'
I0804 13:52:33.566598 1 pinned_memory_manager.cc:161] pinned memory allocation: size 270000, addr 0x7fa64e000090
I0804 13:52:33.566598 1 infer_request.cc:710] prepared: [0x0x7fa631362120] request id: , model: tf_model, requested version: 1, actual version: 1, flags: 0x0, correlation id: 0, batch size: 1, priority: 0, timeout (us): 0
original inputs:
[0x0x7fa6312b0118] input: inputs, type: UINT8, original shape: [1,300,300,3], batch + shape: [1,300,300,3], shape: [300,300,3]
override inputs:
inputs:
[0x0x7fa6312b0118] input: inputs, type: UINT8, original shape: [1,300,300,3], batch + shape: [1,300,300,3], shape: [300,300,3]
original requested outputs:
requested outputs:
detection_boxes
detection_classes
detection_scores
I0804 13:52:33.566598 1 rate_limiter.cc:778]
Max Resource Map===>
I0804 13:52:33.566598 1 backend_model_instance.cc:687] Starting backend thread for tf_model_0 at nice 0 on device 0...
I0804 13:52:33.566598 1 backend_model_instance.cc:551] model 'tf_model' instance tf_model_0 is running warmup sample 'warmup_1'
I0804 13:52:33.566598 1 tensorflow.cc:2401] model tf_model, instance tf_model_0, executing 1 requests
I0804 13:52:33.566598 1 tensorflow.cc:1575] TRITONBACKEND_ModelExecute: Running tf_model_0 with 1 requests
I0804 13:52:33.566598 1 tensorflow.cc:1827] TRITONBACKEND_ModelExecute: input 'inputs' is GPU tensor: false
2022-08-04 13:52:43.860583: F tensorflow/compiler/xla/service/gpu/nvptx_compiler.cc:622] ptxas returned an error during compilation of ptx to sass: 'INTERNAL: couldn't get temp CUBIN file name' If the error message indicates that a file could not be written, please verify that sufficient filesystem space is provided.
config:
name: "tf_model"
platform: "tensorflow_savedmodel"
max_batch_size: 128
input [
{
name: "inputs"
data_type: TYPE_UINT8
format: FORMAT_NHWC
dims: [300, 300, 3]
}
]
output [
{
name: "detection_boxes"
data_type: TYPE_FP32
dims: [100, 4]
},
{
name: "detection_classes"
data_type: TYPE_FP32
dims: [100]
label_filename: "label_map.pbtxt"
},
{
name: "detection_scores"
data_type: TYPE_FP32
dims: [100]
}
]
model_warmup [
{
name : "warmup_1"
batch_size: 1
inputs {
key: "inputs"
value: {
data_type: TYPE_UINT8
dims: [300, 300, 3]
zero_data: true
}
}
}]
dynamic_batching {
preferred_batch_size: [1, 2, 4, 8, 16, 32, 64, 128]
max_queue_delay_microseconds: 30000
preserve_ordering: true
}
optimization {
graph {
level: 1
}
}
version_policy: {
latest {
num_versions: 1
}
}
instance_group [
{
count: 1
kind: KIND_AUTO
#gpus: [0]
# rate_limiter {
# resources [
# {
# name: "R1"
# count: 1
# },
# {
# name: "R2"
# count: 2
# global: true
# }
# ]
# priority: 3
# }
}
]
response_cache {
enable: False
}
command:
docker run --name triton --gpus '"device=0,2"' --rm -p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v SOME_PATH:/models \
nvcr.io/nvidia/tritonserver:22.05-py3 tritonserver \
--model-repository=/models \
--rate-limit=execution_count \
--backend-config=tensorflow,version=2 \
--log-verbose=1
in case needed:
$ nvidia-smi -a
==============NVSMI LOG==============
Timestamp : Thu Aug 4 16:22:16 2022
Driver Version : 510.47.03
CUDA Version : 11.6
Attached GPUs : 3
GPU 00000000:03:00.0
Product Name : NVIDIA GeForce RTX 2080 Ti
Product Brand : GeForce
Product Architecture : Turing
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-ccc43d63-3504-7626-8819-6c7fb9848ff3
Minor Number : 0
VBIOS Version : 90.02.17.40.78
MultiGPU Board : No
Board ID : 0x300
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x03
Device : 0x00
Domain : 0x0000
Device Id : 0x1E0710DE
Bus Id : 00000000:03:00.0
Sub System Id : 0x150319DA
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 16x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 0 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11264 MiB
Reserved : 245 MiB
Used : 1 MiB
Free : 11017 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 3 MiB
Free : 253 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 60 C
GPU Shutdown Temp : 94 C
GPU Slowdown Temp : 91 C
GPU Max Operating Temp : 89 C
GPU Target Temperature : 84 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 30.00 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 280.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7000 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None
GPU 00000000:05:00.0
Product Name : NVIDIA GeForce GTX 980 Ti
Product Brand : GeForce
Product Architecture : Maxwell
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-311f2bdf-da1c-f12e-b3ac-eaeab769dacd
Minor Number : 1
VBIOS Version : 84.00.41.00.4C
MultiGPU Board : No
Board ID : 0x500
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : N/A
OEM Object : N/A
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x05
Device : 0x00
Domain : 0x0000
Device Id : 0x17C810DE
Bus Id : 00000000:05:00.0
Sub System Id : 0x1133196E
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 44 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : N/A
HW Power Brake Slowdown : N/A
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 6144 MiB
Reserved : 59 MiB
Used : 1 MiB
Free : 6082 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 4 MiB
Free : 252 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Aggregate
Single Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Double Bit
Device Memory : N/A
Register File : N/A
L1 Cache : N/A
L2 Cache : N/A
Texture Memory : N/A
Texture Shared : N/A
CBU : N/A
Total : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 54 C
GPU Shutdown Temp : 97 C
GPU Slowdown Temp : 92 C
GPU Max Operating Temp : N/A
GPU Target Temperature : 83 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 16.91 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 150.00 W
Max Power Limit : 275.00 W
Clocks
Graphics : 135 MHz
SM : 135 MHz
Memory : 405 MHz
Video : 405 MHz
Applications Clocks
Graphics : 1164 MHz
Memory : 3505 MHz
Default Applications Clocks
Graphics : 1164 MHz
Memory : 3505 MHz
Max Clocks
Graphics : 1519 MHz
SM : 1519 MHz
Memory : 3505 MHz
Video : 1397 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None
GPU 00000000:82:00.0
Product Name : NVIDIA GeForce RTX 2080 Ti
Product Brand : GeForce
Product Architecture : Turing
Display Mode : Disabled
Display Active : Disabled
Persistence Mode : Enabled
MIG Mode
Current : N/A
Pending : N/A
Accounting Mode : Disabled
Accounting Mode Buffer Size : 4000
Driver Model
Current : N/A
Pending : N/A
Serial Number : N/A
GPU UUID : GPU-28fd72f2-03f8-fa0d-4910-bb1010eaaa5f
Minor Number : 2
VBIOS Version : 90.02.17.40.78
MultiGPU Board : No
Board ID : 0x8200
GPU Part Number : N/A
Module ID : 0
Inforom Version
Image Version : G001.0000.02.04
OEM Object : 1.1
ECC Object : N/A
Power Management Object : N/A
GPU Operation Mode
Current : N/A
Pending : N/A
GSP Firmware Version : N/A
GPU Virtualization Mode
Virtualization Mode : None
Host VGPU Mode : N/A
IBMNPU
Relaxed Ordering Mode : N/A
PCI
Bus : 0x82
Device : 0x00
Domain : 0x0000
Device Id : 0x1E0710DE
Bus Id : 00000000:82:00.0
Sub System Id : 0x150319DA
GPU Link Info
PCIe Generation
Max : 2
Current : 1
Link Width
Max : 16x
Current : 8x
Bridge Chip
Type : N/A
Firmware : N/A
Replays Since Reset : 0
Replay Number Rollovers : 0
Tx Throughput : 0 KB/s
Rx Throughput : 0 KB/s
Fan Speed : 47 %
Performance State : P8
Clocks Throttle Reasons
Idle : Active
Applications Clocks Setting : Not Active
SW Power Cap : Not Active
HW Slowdown : Not Active
HW Thermal Slowdown : Not Active
HW Power Brake Slowdown : Not Active
Sync Boost : Not Active
SW Thermal Slowdown : Not Active
Display Clock Setting : Not Active
FB Memory Usage
Total : 11264 MiB
Reserved : 244 MiB
Used : 1 MiB
Free : 11018 MiB
BAR1 Memory Usage
Total : 256 MiB
Used : 3 MiB
Free : 253 MiB
Compute Mode : Default
Utilization
Gpu : 0 %
Memory : 0 %
Encoder : 0 %
Decoder : 0 %
Encoder Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
FBC Stats
Active Sessions : 0
Average FPS : 0
Average Latency : 0
Ecc Mode
Current : N/A
Pending : N/A
ECC Errors
Volatile
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Aggregate
SRAM Correctable : N/A
SRAM Uncorrectable : N/A
DRAM Correctable : N/A
DRAM Uncorrectable : N/A
Retired Pages
Single Bit ECC : N/A
Double Bit ECC : N/A
Pending Page Blacklist : N/A
Remapped Rows : N/A
Temperature
GPU Current Temp : 65 C
GPU Shutdown Temp : 94 C
GPU Slowdown Temp : 91 C
GPU Max Operating Temp : 89 C
GPU Target Temperature : 84 C
Memory Current Temp : N/A
Memory Max Operating Temp : N/A
Power Readings
Power Management : Supported
Power Draw : 21.67 W
Power Limit : 250.00 W
Default Power Limit : 250.00 W
Enforced Power Limit : 250.00 W
Min Power Limit : 100.00 W
Max Power Limit : 280.00 W
Clocks
Graphics : 300 MHz
SM : 300 MHz
Memory : 405 MHz
Video : 540 MHz
Applications Clocks
Graphics : N/A
Memory : N/A
Default Applications Clocks
Graphics : N/A
Memory : N/A
Max Clocks
Graphics : 2100 MHz
SM : 2100 MHz
Memory : 7000 MHz
Video : 1950 MHz
Max Customer Boost Clocks
Graphics : N/A
Clock Policy
Auto Boost : N/A
Auto Boost Default : N/A
Voltage
Graphics : N/A
Processes : None
Looks like it is trying to get cubin path but it can’t find it (?). Reference
In addition, when commenting the optimization part in the config OR when running only on CPU (KIND_CPU
), the error doesn’t occur.
Does anyone please have any idea how to solve this?
Thank you in advance!
Issue Analytics
- State:
- Created a year ago
- Comments:5 (2 by maintainers)
Top Results From Across the Web
How to let TensorFlow XLA know the CUDA path
When I tried to run any XLA example, TensorFlow has error "Unable to find libdevice dir. Using '.' Failed to compile ptx to...
Read more >Changing Tensorflow PTXAS location - Ask Ubuntu
I am having this wierd issue where I can train the model, but I can't actually get any output because when I run:...
Read more >XLA Custom Calls - TensorFlow
This document describes how to write and use XLA "custom calls". Custom calls let you invoke code written in a programming language like...
Read more >https://cloud.deng-quan.com/Vitis-AI/Vitis-AI-Quan...
#include "tensorflow/stream_executor/cuda/ptxas_utils.h" #include ... { return port::InternalError("couldn't get temp CUBIN file name"); } auto cubin_cleaner ...
Read more >[XLA] Rework debug flags for dumping HLO. (39587aae)
Previously, there was no coordination between the filenames written by the various flags, so info about one module might be dumped with ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
Interesting. Thanks for the update and glad you were able to resolve it. I don’t really know what could have gone wrong but the issue appears to be the GPU state. Closing the issue as the issue appears to be the model related. Please open up a new issue if you have reason to believe that Triton causes the issue to appear.
Hi Tanmay, thanks for your reply!
Yes, it does.
I did, and it worked too, I tried both with CPU and GPU.
FYI, on my host machine I have cuda-11.2, cuDNN 8.1, and I had a running TF object detection Training while I was working with Triton, too bad that I started to have the problem there as well, but as a warning, I am using TF 2.8. Although the earlier logs of the training didn’t show those warning messages. I am still trying to find out what is the cause, maybe my machine (?).