target sse4.1
See original GitHub issueHey,
Would be really handy for me if this could support avx, including specifically sse4.1. Benchmarking simple simd mathematics techniques is what I’m hoping to do, to make informed decisions on simd performance.
Here’s a little test I did to see if quick-bench would help me do what I’m trying to do:
#include <x86intrin.h>
static void DPPS(benchmark::State& state) {
__m128 left, right;
left = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f);
right = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f);
for (auto _ : state) {
__m128 dotted = _mm_dp_ps(left, right, 0xff);
benchmark::DoNotOptimize(dotted);
}
benchmark::DoNotOptimize(left);
benchmark::DoNotOptimize(right);
}
// Register the function as a benchmark
BENCHMARK(DPPS);
static void MULHADD(benchmark::State& state) {
__m128 left, right;
left = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f);
right = _mm_set_ps(1.0f, 2.0f, 3.0f, 4.0f);
for (auto _ : state) {
__m128 dotted = _mm_mul_ps(left, right);
dotted = _mm_hadd_ps(dotted, dotted);
dotted = _mm_hadd_ps(dotted, dotted);
benchmark::DoNotOptimize(dotted);
}
benchmark::DoNotOptimize(left);
benchmark::DoNotOptimize(right);
}
BENCHMARK(MULHADD);
The errors generated:
Error or timeout
bench-file.cpp:9:21: error: '__builtin_ia32_dpps' needs target feature sse4.1
__m128 dotted = _mm_dp_ps(left, right, 0xff);
^
/usr/lib/clang/5.0.0/include/smmintrin.h:620:12: note: expanded from macro '_mm_dp_ps'
(__m128) __builtin_ia32_dpps((__v4sf)(__m128)(X), \
^
bench-file.cpp:26:14: error: always_inline function '_mm_hadd_ps' requires target feature 'sse3', but would be inlined into function 'MULHADD' that is compiled without support for 'sse3'
dotted = _mm_hadd_ps(dotted, dotted);
^
bench-file.cpp:27:14: error: always_inline function '_mm_hadd_ps' requires target feature 'sse3', but would be inlined into function 'MULHADD' that is compiled without support for 'sse3'
dotted = _mm_hadd_ps(dotted, dotted);
^
3 errors generated.
Cheers
Issue Analytics
- State:
- Created 6 years ago
- Reactions:2
- Comments:6 (2 by maintainers)
Top Results From Across the Web
Clang error on SSE4.1 intrinsic - Visual Studio Feedback
Trying to compile with Clang (on a A10-7870k) I get. error : '__builtin_ia32_roundps' needs target feature sse4.1. for a call to _mm_floor_ps.
Read more >Proper way to enable SSE4 on a per-function / per-block of ...
There is currently no way to target different ISA extensions at block / function granularity in clang. You can only do it at...
Read more >x86 Options (Using the GNU Compiler Collection (GCC))
VIA Nano 3xxx CPU with x86-64, MMX, SSE, SSE2, SSE3, SSSE3 and SSE4.1 ... This option is overridden when -march indicates that the...
Read more >[C++] O(n * m) 132 ms - LeetCode Discuss
... #pragma GCC target("sse,sse2,sse3,ssse3,sse4,popcnt,abm,mmx,avx,avx2 ... i--;) { for (int lg : languages[i]) lmap[i][lg] = 1; ...
Read more >I told the Microsoft Visual C++ compiler not to generate AVX ...
You explicitly requested an SSE4 instruction, so the compiler honored your request. ... [[gnu::target(“sse4.1”)]] void something(int alpha)
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@FredTingaud You can try adding “-march native” to the compiler options.
Running into this again 4 years later, so I’m back to +1 my own issue. 😃
This time I’m trying to benchmark __popcnt against other methods of counting bits in an integer.