It is not easy to find documentation regarding Clang’s optimization options, especially for newer versions of LLVM. So I did a manual statistics and organized them in a table. This article and the table was inspired&forked from lolo32, many thanks!
Currently, there are two versions of Clang co-existing my Intel NUC 10, Clang 12.0.1(installed through source code) and Clang 11.0.0(installed through yum). The OS information is as the title.
Version 1:
1 Clang version 12.0.1
2 Target: x86_64-unknown-linux gnu
3 Thread model: posix
This was made with commands:
1 echo 'int;' | clang++-12 -xc -O0 - -o /dev/null -\#\#\#
2 echo 'int;' | clang++-12 -xc -O1 - -o /dev/null -\#\#\#
3 echo 'int;' | clang++-12 -xc -O2 - -o /dev/null -\#\#\#
4 echo 'int;' | clang++-12 -xc -O3 - -o /dev/null -\#\#\#
5 echo 'int;' | clang++-12 -xc -Ofast - -o /dev/null -\#\#\#
6 echo 'int;' | clang++-12 -xc -Os - -o /dev/null -\#\#\#
7 echo 'int;' | clang++-12 -xc -Oz - -o /dev/null -\#\#\#
-O0
means “no optimization”: this level compiles the fastest and generates the most debuggable code. It enable-mrelax-all
option.-O1
somewhere between-O0
and-O2
.-O2
moderate level of optimization which enables most optimizations.-O3
is like-O2
except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).-Ofast
enable-O3
, with other aggressive optimizations that may violate strict compliance with language standards. It speedups math calculations. Including 1. Floating-point math obeys regular algebraic rules for real numbers (e.g.+
and*
are associative,x/y == x * (1/y)
, and(a + b) * c == a * c + b * c)
2. Operands to floating-point operations are not equal toNaN
andInf
, and 3.+0
and-0
are interchangeable.-ffast-math
also defines the__FAST_MATH__
preprocessor macro. Some math libraries recognize this macro and change their behavior. With the exception of-ffp-contract=fast
, using any of the options below to disable any of the individual optimizations in-ffast-math
will cause__FAST_MATH__
to no longer be set.-Os
is like-O2
with extra optimizations to reduce code size.-Oz
is like-Os
, but try to minimize even more the code size.
Below are the tables,
Option | -O0 | -O1 | -O2 | -O3 | -Ofast | -Os | -Oz | Description |
---|---|---|---|---|---|---|---|---|
-cc1 : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | the frontend |
-triple x86_64-unknown-linux-gnu |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Specify target triple(architecture) |
-emit-obj : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Emit native object files |
-mrelax-all : |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | (integrated) relax all machine instructions |
--mrelax-relocations : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | These options control whether the assembler should generate relax relocations |
-disable-free : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Disable freeing of memory on exit |
-mrelocation-model static : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The relocation model to use (what is static?) |
-mframe-pointer=all : |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | keep frame pointers |
-mframe-pointer=none : |
❌: | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | eliminate frame pointers which point to the base address of the function’s frame |
-menable-no-inf : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow ;optimization to assume there are no infinities. |
-menable-no-nans : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow ;optimization to assume there are no NaNs. |
-menable-unsafe-fp-math : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow unsafe floating-point math optimizations which may decrease precision |
-fno-signed-zeros : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow optimizations for floating point arithmetic that ignore the signedness of zero |
-mreassociate : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow reassociation transformations for floating-point instructions |
-freciprocal-math : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow division operations to be reassociated |
-fdenormal-fp-math=preserve-sign,preserve-sign : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Select which denormal numbers the code is permitted to require. |
-ffp-contract=fast : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Form fused FP ops (e.g. FMAs): fast (everywhere) OR on (according to FP_CONTRACT pragma, ;default) OR off (never fuse) |
-fmath-errno : |
✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | Require math functions to indicate errors by setting errno |
-fno-rounding-math : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Force floating-point operations to honor the dynamically-set rounding mode by default. |
-ffast-math : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math |
-ffinite-math-only : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf. |
-mconstructor-aliases : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | enable constructor aliases |
-munwind-tables : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Generate unwinding tables for all functions |
-target-cpu x86-64 : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Target a specific cpu type |
-tune-cpu generic : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | tells the compiler to emit instructions for some (probably ancient, like generic x86-64) CPU, but schedule (order) the instructions for a (probably more common, like a broadwell or a znver2) one. Same as -mtune on GCC |
-fno-split-dwarf-inlining : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Provide minimal debug info in the object/executable to facilitate online symbolication/stack traces in the absence of .dwo/.dwp files when using Split DWARF |
-debugger-tuning=gdb : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | tune the debug info |
-internal-isystem : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
-internal-externc-isystem : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
-resource-dir : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The directory which holds the compiler resource files |
-fdebug-compilation-dir : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The compilation directory to embed in the debug info. |
-ferror-limit 19 : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Set the maximum number of errors to emit before stopping (0 = no limit). |
-fgnuc-version=4.2.1 : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Sets various macros to claim compatibility with the given GCC version |
-fcolor-diagnostics : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Use colors in diagnostics |
-faddrsig : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Emit an address-significance table |
-vectorize-loops : |
❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | Run the Loop vectorization passes |
-vectorize-slp : |
❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | Run the SLP vectorization passes |
-main-file-name : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
And Version 2:
1 Clang version 11.0.0(Red Hat 11.0.0-1.module+el8.4.0)
2 Target: x86_64-unknown-linux gnu
3 Thread model: posix
With the same commands except replacing clang++-12
with clang++
for each.
Option | -O0 | -O1 | -O2 | -O3 | -Ofast | -Os | -Oz | Description |
---|---|---|---|---|---|---|---|---|
-cc1 : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | the frontend |
-triple x86_64-unknown-linux-gnu |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Specify target triple(architecture) |
-emit-obj : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Emit native object files |
-mrelax-all : |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | (integrated) relax all machine instructions |
-disable-free : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Disable freeing of memory on exit |
-disable-llvm-verifier : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Don’t run the LLVM IR verifier pass |
-discard-value-names : |
❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Discard value names when generating LLVM IR |
-mrelocation-model static : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The relocation model to use (what is static?) |
-mframe-pointer=all : |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | keep frame pointers |
-mframe-pointer=none : |
❌: | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | eliminate frame pointers which point to the base address of the function’s frame |
-menable-no-inf : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow ;optimization to assume there are no infinities. |
-menable-no-nans : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow ;optimization to assume there are no NaNs. |
-menable-unsafe-fp-math : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow unsafe floating-point math optimizations which may decrease precision |
-fno-signed-zeros : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow optimizations for floating point arithmetic that ignore the signedness of zero |
-mreassociate : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow reassociation transformations for floating-point instructions |
-freciprocal-math : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow division operations to be reassociated |
-fdenormal-fp-math=preserve-sign,preserve-sign : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Select which denormal numbers the code is permitted to require. |
-ffp-contract=fast : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Form fused FP ops (e.g. FMAs): fast (everywhere) OR on (according to FP_CONTRACT pragma, ;default) OR off (never fuse) |
-fmath-errno : |
✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | Require math functions to indicate errors by setting errno |
-fno-rounding-math : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Force floating-point operations to honor the dynamically-set rounding mode by default. |
-ffast-math : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math |
-ffinite-math-only : |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf. |
-mconstructor-aliases : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | enable constructor aliases |
-munwind-tables : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Generate unwinding tables for all functions |
-target-cpu x86-64 : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Target a specific cpu type |
-fno-split-dwarf-inlining : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Provide minimal debug info in the object/executable to facilitate online symbolication/stack traces in the absence of .dwo/.dwp files when using Split DWARF |
-debugger-tuning=gdb : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | tune the debug info |
-internal-isystem : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
-internal-externc-isystem : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
-resource-dir : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The directory which holds the compiler resource files |
-fdebug-compilation-dir : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The compilation directory to embed in the debug info. |
-ferror-limit 19 : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Set the maximum number of errors to emit before stopping (0 = no limit). |
-fgnuc-version=4.2.1 : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Sets various macros to claim compatibility with the given GCC version |
-fcolor-diagnostics : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Use colors in diagnostics |
-faddrsig : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Emit an address-significance table |
-vectorize-loops : |
❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | Run the Loop vectorization passes |
-vectorize-slp : |
❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | Run the SLP vectorization passes |
-main-file-name : |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
We can see the main differences of optimization parameters between the two versions are the introduction of cpu tune option and relax relocations.
Comments