Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

FP64 Emulation Support is Broken; Cannot Run Own Scripts

See original GitHub issue

Original script (bench_mixed-precision_vs_single-precision_pytorch_ipex.py) I am running using IPEX:

import torch
import intel_extension_for_pytorch as ipex
import matplotlib.pyplot as plt
import time
import sys
torch.set_default_tensor_type(torch.FloatTensor)

# Comment out the following two lines if running this script in a ROCm environment
# import torch.backends.cudnn as cudnn
# cudnn.benchmark = True

def grid(width, height):
  hrange = torch.arange(width).unsqueeze(0).repeat([height, 1]).div(width)
  vrange = torch.arange(height).unsqueeze(1).repeat([1, width]).div(height)
  output = torch.stack([hrange, vrange], 0).float()
  return output


def checker(width, height, freq):
  hrange = torch.arange(width).reshape([1, width]).mul(freq / width / 2.0).fmod(1.0).gt(0.5)
  vrange = torch.arange(height).reshape([height, 1]).mul(freq / height / 2.0).fmod(1.0).gt(0.5)
  output = hrange.logical_xor(vrange).float()
  return output

if len(sys.argv) > 1 and sys.argv[1] != "bench_mixed_precision":
    print("\nUsage:", sys.argv[0], "[bench_mixed_precision]\n")
    quit()

# Note the inputs are grid coordinates and the target is a checkerboard
inputs = grid(384, 384).unsqueeze(0).to("xpu")
targets = checker(384, 384, 8).unsqueeze(0).unsqueeze(1).to("xpu")

class Net(torch.jit.ScriptModule):
  def __init__(self):
    super().__init__()
    self.net = torch.nn.Sequential(
      torch.nn.Conv2d(2, 256, 1),
      torch.nn.BatchNorm2d(256),
      torch.nn.ReLU(),
      torch.nn.Conv2d(256, 256, 1),
      torch.nn.BatchNorm2d(256),
      torch.nn.ReLU(),
      torch.nn.Conv2d(256, 256, 1),
      torch.nn.BatchNorm2d(256),
      torch.nn.ReLU(),
      torch.nn.Conv2d(256, 1, 1))

  @torch.jit.script_method
  def forward(self, x):
    return self.net(x)

net = Net().to("xpu")
loss_fn = torch.nn.MSELoss().to("xpu")
opt = torch.optim.Adam(net.parameters(), 0.001)
net, opt = ipex.optimize(net, optimizer=opt, dtype=torch.float32)

print("Starting training loop, please be patient...")

start_time = time.time()

# for i in range(400):
for i in range(3):
  opt.zero_grad()
  if len(sys.argv) > 1 and sys.argv[1] == "bench_mixed_precision":
      with torch.xpu.amp.autocast(enabled=True, dtype=torch.float16):
        outputs = net(inputs)
        loss = loss_fn(outputs, targets)
  else:
      outputs = net(inputs)
      loss = loss_fn(outputs, targets)
  loss.backward()
  opt.step()
#  if (i + 1) % 50 == 0:
  if (i + 1) % 1 == 0:
      print(loss)
      print("Completed iteration %d/%d" % (i + 1, 400))

torch.xpu.synchronize()
print(f"Training completed in {time.time() - start_time} seconds :)")
print(loss)

Running the script without doing export OverrideDefaultFP64Settings=1 && export IGC_EnableDPEmulation=1 yields the following output:

root@d8b5bd7be0b9:/workspace# python3 bench_mixed-precision_vs_single-precision_pytorch_ipex.py | tee ipex_bench_output1.txt
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale1IffEEvPT_PKT0_mdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
...
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPUL24launch_vectorized_kernelINS0_13BUnaryFunctorIddbZZZNS0_4impl15gt_kernel_dpcppERNS_14TensorIteratorEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEN3xpu5dpcpp5ArrayIPcLi2EEE23TrivialOffsetCalculatorILi1EjEEEvlRKT_T0_T1_iENKUlRN2cl4sycl7handlerEE0_clESP_EUlNSN_7nd_itemILi1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPUL24launch_vectorized_kernelINS0_13BUnaryFunctorIddbZZZNS0_4impl15gt_kernel_dpcppERNS_14TensorIteratorEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEN3xpu5dpcpp5ArrayIPcLi2EEE23TrivialOffsetCalculatorILi1EjEEEvlRKT_T0_T1_iENKUlRN2cl4sycl7handlerEE0_clESP_EUlNSN_7nd_itemILi1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPUL24launch_vectorized_kernelINS0_13BUnaryFunctorIddbZZZNS0_4impl15gt_kernel_dpcppERNS_14TensorIteratorEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEN3xpu5dpcpp5ArrayIPcLi2EEE23TrivialOffsetCalculatorILi1EjEEEvlRKT_T0_T1_iENKUlRN2cl4sycl7handlerEE0_clESP_EUlNSN_7nd_itemILi1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPUL24launch_vectorized_kernelINS0_13BUnaryFunctorIddbZZZNS0_4impl15gt_kernel_dpcppERNS_14TensorIteratorEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEN3xpu5dpcpp5ArrayIPcLi2EEE23TrivialOffsetCalculatorILi1EjEEEvlRKT_T0_T1_iENKUlRN2cl4sycl7handlerEE0_clESP_EUlNSN_7nd_itemILi1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPUL24launch_vectorized_kernelINS0_13BUnaryFunctorIddbZZZNS0_4impl15gt_kernel_dpcppERNS_14TensorIteratorEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEN3xpu5dpcpp5ArrayIPcLi2EEE23TrivialOffsetCalculatorILi1EjEEEvlRKT_T0_T1_iENKUlRN2cl4sycl7handlerEE0_clESP_EUlNSN_7nd_itemILi1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPUL24launch_vectorized_kernelINS0_13BUnaryFunctorIddbZZZNS0_4impl15gt_kernel_dpcppERNS_14TensorIteratorEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEN3xpu5dpcpp5ArrayIPcLi2EEE23TrivialOffsetCalculatorILi1EjEEEvlRKT_T0_T1_iENKUlRN2cl4sycl7handlerEE0_clESP_EUlNSN_7nd_itemILi1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.
[CRITICAL ERROR] Kernel '_ZTSZZN2at15AtenIpexTypeXPUL24launch_vectorized_kernelINS0_13BUnaryFunctorIddbZZZNS0_4impl15gt_kernel_dpcppERNS_14TensorIteratorEENKUlvE_clEvENKUlvE1_clEvEUlddE_EEN3xpu5dpcpp5ArrayIPcLi2EEE23TrivialOffsetCalculatorILi1EjEEEvlRKT_T0_T1_iENKUlRN2cl4sycl7handlerEE0_clESP_EUlNSN_7nd_itemILi1EEEE_' removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.

And if I enable FP64 emulation by doing export OverrideDefaultFP64Settings=1 && export IGC_EnableDPEmulation=1 I get the following output instead:

root@d8b5bd7be0b9:/workspace# python3 bench_mixed-precision_vs_single-precision_pytorch_ipex.py 2>&1 | tee ipex_bench_output2.txt
Starting training loop, please be patient...
Traceback (most recent call last):
  File "/workspace/bench_mixed-precision_vs_single-precision_pytorch_ipex.py", line 75, in <module>
    print(loss)
  File "/opt/intel/oneapi/intelpython/latest/lib/python3.9/site-packages/torch/_tensor.py", line 249, in __repr__
    return torch._tensor_str._str(self)
  File "/opt/intel/oneapi/intelpython/latest/lib/python3.9/site-packages/torch/_tensor_str.py", line 415, in _str
    return _str_intern(self)
  File "/opt/intel/oneapi/intelpython/latest/lib/python3.9/site-packages/torch/_tensor_str.py", line 390, in _str_intern
    tensor_str = _tensor_str(self, indent)
  File "/opt/intel/oneapi/intelpython/latest/lib/python3.9/site-packages/torch/_tensor_str.py", line 251, in _tensor_str
    formatter = _Formatter(get_summarized_data(self) if summarize else self)
  File "/opt/intel/oneapi/intelpython/latest/lib/python3.9/site-packages/torch/_tensor_str.py", line 102, in __init__
    if value != torch.ceil(value):
RuntimeError: Native API failed. Native API returns: -1 (CL_DEVICE_NOT_FOUND) -1 (CL_DEVICE_NOT_FOUND)

As you can see because the script I am running (and probably may more scripts I may run in the future for that matter) requires FP64 instructions to be supported, but because the support for such instructions is broken right now as shown above, I absolutely cannot run the workloads I’d like to run using the IPEX binary wheel that you guys have distributed (and the same issue occurs when I try building my own wheels from source). Could you please fix this so that I can properly take advantage of FP64 emulation in IPEX, or at least give me advice on how I can modify my script so that I can run it without needed to resort to enabling FP64 emulation?

I’ll be more than happy to provide additional info about my system if needed to help solve this issue. 😄

Issue Analytics

State:
Created 10 months ago
Comments:9 (4 by maintainers)

Top GitHub Comments

2reactions

xsachacommented, Nov 23, 2022

I also get these messages printed out despite having absolutely no FP64 code. The message does not appear if you disable profiling.

Reproducer:

import torch
import torch.nn as nn
import intel_extension_for_pytorch as ipex

if __name__ == '__main__':
    x = nn.Sequential(nn.Conv2d(3, 64, (3, 3)),
                      nn.BatchNorm2d(64)) 
    inp = torch.randn(1,3,64,64)
    traced = torch.jit.trace(x, inp)
    traced.to('xpu')
    inp = inp.to('xpu')
    # Run twice
    out = traced(inp)
    out = traced(inp)

Output (100 or so of these):

[CRITICAL ERROR] Kernel ‘ZTSZZN2at15AtenIpexTypeXPU17dpcppMemoryScale2IffEEvPT_PKT0_mfdENKUlRN2cl4sycl7handlerEE_clESA_EUlNS8_4itemILi1ELb1EEEE’ removed due to usage of FP64 instructions unsupported by the targeted hardware. Running this kernel may result in unexpected results.

2reactions

gujinghuicommented, Nov 17, 2022

@tedliosu This is our first release mainly for data center GPU. We are taking client dGPU into account. Please give us more time.

Although Arc Alchemist dGPU has similar arch, XPU should work as well. But without full verification, I cannot make you take risk to waste your money. 😦

Top Results From Across the Web

Tensorflow on Intel arc GPU ? : r/learnmachinelearning - Reddit

I gave Intel Extension for Pytorch (a.k.a. IPEX) a shot using my i5 11400H's integrated graphics (yes IPEX can run on basically any...

disable-fp64-math-optimization - Intel

Run Trip Counts and FLOP analyses of the Characterization stage. Model your application performance on a target device and disable accounting for optimized ......

Inaudible Discussion - Greg Gant

It also boasts something Apple is terrible at Legacy support. Windows 10 and 11 can run via emulation layers, apps written for Windows...

themaister – Page 3 – Maister's Graphics Adventures

This uses quite a bit of processing power, so we can't run wild with effects like this right ... It makes use of...

Mesa (computer graphics) - Wikipedia

Mesa, also called Mesa3D and The Mesa 3D Graphics Library, is an open source implementation of OpenGL, Vulkan, and other graphics API specifications....