[CodeGen] Incorrect vectorization for fp16 on X86-64 (AVX512)
See original GitHub issueHi,
There seems a bug related to vectorization with type fp16 on skylake machine. Basically, when cast fp16 to uint8 using select for AVX512 machine, it will generate wrong result with vectorization on.
The generated ll
code seems correct after checking dumped llvm code, but the generated asm
code seems problematic. It might be some configurations when doing the code generation /src/codegen/llvm/llvm_common.cc
import tvm
import numpy
## shape cause wrong result
m=3
n=2
k=2
## shape cause LLVM ERROR
# m=3
# n=1
# k=4
dtype = "float16"
target = 'llvm -mcpu=skylake-avx512'
ctx = tvm.context(target, 0)
input = [0.0, 1.0, 2.0, 3.0, 4.0, 5.0, 6.0, 7.0, 8.0, 9.0, 10.0, 11.0]
in_array = numpy.array(input).reshape(m,n,k).astype(dtype)
a = tvm.nd.array(in_array, ctx)
X=tvm.placeholder((m,n,k), name='X', dtype="float16")
zero = tvm.const(0, "uint8")
one = tvm.const(1, "uint8")
z16 = tvm.const(0, "float16")
Y=tvm.compute((m,n,k), lambda i,j,k: tvm.expr.Select(X[i,j,k] == z16, zero, one))
s=tvm.create_schedule(Y.op)
s[Y].vectorize(Y.op.axis[2])
print tvm.lower(s,[X,Y], simple_mode=True)
func = tvm.build(s, [X,Y], target=target, name='bug')
assert func
output = [False, True, True, True, True, True, True, True, True, True, True, True]
out = numpy.array(output).reshape(m,n,k).astype("bool")
b= tvm.nd.array(numpy.zeros((m,n,k), dtype="bool"), ctx)
func(a,b)
tvm.testing.assert_allclose(b.asnumpy(), out)
Issue Analytics
- State:
- Created 4 years ago
- Comments:12 (6 by maintainers)
Top Results From Across the Web
[CodeGen] Incorrect vectorization for fp16 on X86-64 (AVX512)
Hi, There seems a bug related to vectorization with type fp16 on skylake machine. Basically, when cast fp16 to uint8 using select for...
Read more >D105263 [X86] AVX512FP16 instructions enabling 1/6 - LLVM
Enable FP16 type support and basic declarations used by following patches. ... llvm/test/CodeGen/X86/vector-reduce-fmax-nnan.ll.
Read more >Bug List - GCC, the GNU Compiler Collection
ID Product Comp Assignee△ Status△ Changed
60481 gcc target unassigned UNCO 2016‑10‑03
13515 gcc target unassigned NEW 2021‑09‑13
43644 gcc target unassigned NEW 2021‑08‑28
Read more >AVX-512 - Wikipedia
AVX-512 are 512-bit extensions to the 256-bit Advanced Vector Extensions SIMD instructions for x86 instruction set architecture (ISA) proposed by Intel in ...
Read more >2017 EuroLLVM Developers' Meeting: G. Blank “AVX-512 ...
2017 EuroLLVM Developers' Meeting: G. Blank “ AVX-512 Mask Registers Code Gen Challenges in LLVM”. 2.5K views · 5 years ago ...more ...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
FYI, I think might have fixed the LLVM bug this was hitting here https://github.com/llvm/llvm-project/commit/7b49e8ac359bc35f95af548fbed4b7afd625caab
I was asked to look at why -mcpu=cascadelake didn’t generate the same code as -mcpu=skylake-avx512 when using tvm with llvm 8.0. After ruling out llvm I started poking around tvm and happened to see this avx512 issue in my search results. Coincidentally we had hit the same underlying llvm bug internally last week with a different frontend.