cat avx512_include.cpp
#include <immintrin.h>
int main() {}
g++ -march=core2 -E -P avx512_include.cpp | wc -l
34365
g++ -march=skylake-avx512 -E -P avx512_include.cpp | wc -l
34257
I know that I could theoretically hack around my gcc instalation and pray that it will work when I remove all content from avx512 headers 😈, but I am looking for a supported way if there is one.
I have tried to look inside gcc headers to see if there is some macro check to include avx512 headers or not, but they seem to be included unconditionally.
P.S. I tried to find a gcc march tag, could not find any, if somebody knows more appropriate tags beside current please comment
–
–
GCC/clang support __attribute__((target("avx512f")))
for use of AVX-512 stuff in a file compiled without a -march=
that implies -mavx512f
. So immintrin.h
still has to pull in AVX512 definitions.
But you'll still get compile errors from GCC and clang if you try to use __m512i v = _mm512_set1_ps(1.0);
without enabling it. See
The Effect of Architecture When Using SSE / AVX Intrinisics
What exactly do the gcc compiler switches (-mavx -mavx2 -mavx512f) do?
For shorter compile times, perhaps pre-compiled headers could be an option. Most Linux distros don't do that by default.
Or in projects that won't use AVX-512 at all, yeah you could hack things up so the headers don't get included, or at least don't really get compiled.
Least intrusive might be to define GCC's include-guard macros before the files are included the first time. GCC's pre-processor will still read through the files, but the compiler proper won't spend any time on them. By size, avx512fintrin.h and avx512vlintrin.h are the big ones; AMX and VNNI and so on are much smaller headers, so probably only really worth bothering with avx512*.h
.
$ grep --no-filename '#define.*_INCLUDED$' /usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/avx512*
#define _AVX5124FMAPSINTRIN_H_INCLUDED
#define _AVX5124VNNIWINTRIN_H_INCLUDED
#define _AVX512BF16INTRIN_H_INCLUDED
Maybe redirect this into a skip-avx512.h
.
However, some later files to define AVX-512 versions of their intrinsics, so lack of definitions for __mmask8
and so on will break headers like gfniintrin.h
and vaesintrin.h
which define AVX-512 intrinsics and non-AVX-512 VEX (and legacy-SSE for gfni) intrinsics as well. You can work around this by adding definitions for the key types involved, or skipping those headers entirely as well.
// Some headers aren't exclusively AVX-512 but have some AVX-512 intrinsics
#if 0
#define _GFNIINTRIN_H_INCLUDED // SSE / VEX / EVEX intrinsics
#define __VAESINTRIN_H_INCLUDED // wider VEX / EVEX versions of AES-NI
#define _VPCLMULQDQINTRIN_H_INCLUDED // wider VEX / EVEX versions of pclmul
#else // keep those headers happy
// These are the correct definitions for GCC as of 12.2, and highly unlikely to ever change.
// Not that it should matter except for matching builtins
typedef long long __m512i __attribute__((vector_size(64),may_alias)); // used in gfniintrin and vaesintrin.h
typedef char __v64qi __attribute__ ((__vector_size__ (64))); // used in gfniintrin.h
typedef long long __v8di __attribute__ ((__vector_size__ (64))); // used in vpclmulqdqintrin.h with optimization enabled
__attribute__((target("avx512f"))) static inline __m512i
_mm512_setzero_si512(void) { return (__m512i){0}; } // the real definition is more verbose
typedef unsigned char __mmask8; // used in gfniintrin.h
typedef unsigned short __mmask16;
typedef unsigned __mmask32;
typedef unsigned long long __mmask64;
#endif
This works with GCC12.2 on my Arch Linux system. Compiling your empty testcase without optimization:
$ uname -a # x86-64 Arch GNU/Linux, using their binary packages
Linux volta 6.0.10-arch2-1 #1 SMP PREEMPT_DYNAMIC Sat, 26 Nov 2022 16:51:18 +0000 x86_64 GNU/Linux
$ gcc -v
gcc version 12.2.0 (GCC)
$ perf stat -r 20 gcc -c avx512-included.c # vanilla, no extra #define stuff
(average 265 ms on an i7-6700k at 3.9GHz)
$ perf stat -r 20 gcc -c avx512-excluded.c # with _AVX512 #defines and the typedefs
(average 98 ms on an i7-6700k at 3.9GHz)
With optimization enabled (-O2 -march=native
on Skylake), it's 295 vs. 79 ms.
With -O2
without any arch=, it's 322 ms vs. 105 ms. Perhaps not having to run the #pragma target()
lines is what makes -march=
faster. Defining _GFNIINTRIN_H_INCLUDED
and the other two bring the faster time down to 95 ms, so apparently those relatively small headers make a big difference.
-O2
is likely slower to compile more lines because intrinsics that take an immediate are only macros without #ifdef __OPTIMIZE__
. (As well as maybe spending some time optimizing even when the functions don't get inlined anywhere.)
A more aggressive approach that avoids even preprocessing those lines would be to make a custom version of /usr/lib/gcc/x86_64-pc-linux-gnu/12.2.0/include/immintrin.h
and simply strip out all the #include
lines for avx512*
headers, and others like avxvnniintrin.h
, amx*
, and shaintrin.h
that are too new for you to want to use.
The same precautions for VAES, GFNI, and VPCLMUL are necessary if you don't want to skip them entirely.
Maybe call it "custom_gcc_intrin.h"
and include that instead of immintrin.h
? Or call it immintrin.h
but put it somewhere inside that specific project where it's looked for first as an include path?
Don't modify your system copy of immintrin.h
or set things up so a modified version is used by default; that will probably bite you at some point in the future when you try to compile something that does have AVX-512 code paths with runtime detection. You'll be super confused why your GCC is broken if you forget about this modification years ago.
Thanks for contributing an answer to Stack Overflow!
- Please be sure to answer the question. Provide details and share your research!
But avoid …
- Asking for help, clarification, or responding to other answers.
- Making statements based on opinion; back them up with references or personal experience.
To learn more, see our tips on writing great answers.