It is not easy to find documentation regarding Clang’s optimization options, especially for newer versions of LLVM. So I did a manual statistics and organized them in a table. This article and the table was inspired&forked from lolo32, many thanks!
Currently, there are two versions of Clang co-existing my Intel NUC 10, Clang 12.0.1(installed through source code) and Clang 11.0.0(installed through yum). The OS information is as the title.
Version 1:
1 Clang version 12.0.1
2 Target: x86_64-unknown-linux gnu
3 Thread model: posix
This was made with commands:
1 echo 'int;' | clang++-12 -xc -O0 - -o /dev/null -\#\#\#
2 echo 'int;' | clang++-12 -xc -O1 - -o /dev/null -\#\#\#
3 echo 'int;' | clang++-12 -xc -O2 - -o /dev/null -\#\#\#
4 echo 'int;' | clang++-12 -xc -O3 - -o /dev/null -\#\#\#
5 echo 'int;' | clang++-12 -xc -Ofast - -o /dev/null -\#\#\#
6 echo 'int;' | clang++-12 -xc -Os - -o /dev/null -\#\#\#
7 echo 'int;' | clang++-12 -xc -Oz - -o /dev/null -\#\#\#
-O0means “no optimization”: this level compiles the fastest and generates the most debuggable code. It enable-mrelax-alloption.-O1somewhere between-O0and-O2.-O2moderate level of optimization which enables most optimizations.-O3is like-O2except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).-Ofastenable-O3, with other aggressive optimizations that may violate strict compliance with language standards. It speedups math calculations. Including 1. Floating-point math obeys regular algebraic rules for real numbers (e.g.+and*are associative,x/y == x * (1/y), and(a + b) * c == a * c + b * c)2. Operands to floating-point operations are not equal toNaNandInf, and 3.+0and-0are interchangeable.-ffast-mathalso defines the__FAST_MATH__preprocessor macro. Some math libraries recognize this macro and change their behavior. With the exception of-ffp-contract=fast, using any of the options below to disable any of the individual optimizations in-ffast-mathwill cause__FAST_MATH__to no longer be set.-Osis like-O2with extra optimizations to reduce code size.-Ozis like-Os, but try to minimize even more the code size.
Below are the tables,
| Option | -O0 | -O1 | -O2 | -O3 | -Ofast | -Os | -Oz | Description |
|---|---|---|---|---|---|---|---|---|
-cc1: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | the frontend |
-triple x86_64-unknown-linux-gnu |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Specify target triple(architecture) |
-emit-obj: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Emit native object files |
-mrelax-all: |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | (integrated) relax all machine instructions |
--mrelax-relocations: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | These options control whether the assembler should generate relax relocations |
-disable-free: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Disable freeing of memory on exit |
-mrelocation-model static: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The relocation model to use (what is static?) |
-mframe-pointer=all: |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | keep frame pointers |
-mframe-pointer=none: |
❌: | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | eliminate frame pointers which point to the base address of the function’s frame |
-menable-no-inf: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow ;optimization to assume there are no infinities. |
-menable-no-nans: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow ;optimization to assume there are no NaNs. |
-menable-unsafe-fp-math: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow unsafe floating-point math optimizations which may decrease precision |
-fno-signed-zeros: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow optimizations for floating point arithmetic that ignore the signedness of zero |
-mreassociate: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow reassociation transformations for floating-point instructions |
-freciprocal-math: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow division operations to be reassociated |
-fdenormal-fp-math=preserve-sign,preserve-sign: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Select which denormal numbers the code is permitted to require. |
-ffp-contract=fast: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Form fused FP ops (e.g. FMAs): fast (everywhere) OR on (according to FP_CONTRACT pragma, ;default) OR off (never fuse) |
-fmath-errno: |
✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | Require math functions to indicate errors by setting errno |
-fno-rounding-math: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Force floating-point operations to honor the dynamically-set rounding mode by default. |
-ffast-math: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math |
-ffinite-math-only: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf. |
-mconstructor-aliases: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | enable constructor aliases |
-munwind-tables: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Generate unwinding tables for all functions |
-target-cpu x86-64: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Target a specific cpu type |
-tune-cpu generic: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | tells the compiler to emit instructions for some (probably ancient, like generic x86-64) CPU, but schedule (order) the instructions for a (probably more common, like a broadwell or a znver2) one. Same as -mtune on GCC |
-fno-split-dwarf-inlining: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Provide minimal debug info in the object/executable to facilitate online symbolication/stack traces in the absence of .dwo/.dwp files when using Split DWARF |
-debugger-tuning=gdb: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | tune the debug info |
-internal-isystem: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
-internal-externc-isystem: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
-resource-dir: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The directory which holds the compiler resource files |
-fdebug-compilation-dir: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The compilation directory to embed in the debug info. |
-ferror-limit 19: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Set the maximum number of errors to emit before stopping (0 = no limit). |
-fgnuc-version=4.2.1: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Sets various macros to claim compatibility with the given GCC version |
-fcolor-diagnostics: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Use colors in diagnostics |
-faddrsig: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Emit an address-significance table |
-vectorize-loops: |
❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | Run the Loop vectorization passes |
-vectorize-slp: |
❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | Run the SLP vectorization passes |
-main-file-name: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
And Version 2:
1 Clang version 11.0.0(Red Hat 11.0.0-1.module+el8.4.0)
2 Target: x86_64-unknown-linux gnu
3 Thread model: posix
With the same commands except replacing clang++-12 with clang++ for each.
| Option | -O0 | -O1 | -O2 | -O3 | -Ofast | -Os | -Oz | Description |
|---|---|---|---|---|---|---|---|---|
-cc1: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | the frontend |
-triple x86_64-unknown-linux-gnu |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Specify target triple(architecture) |
-emit-obj: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Emit native object files |
-mrelax-all: |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | (integrated) relax all machine instructions |
-disable-free: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Disable freeing of memory on exit |
-disable-llvm-verifier: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Don’t run the LLVM IR verifier pass |
-discard-value-names: |
❌ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Discard value names when generating LLVM IR |
-mrelocation-model static: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The relocation model to use (what is static?) |
-mframe-pointer=all: |
✅ | ❌ | ❌ | ❌ | ❌ | ❌ | ❌ | keep frame pointers |
-mframe-pointer=none: |
❌: | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | eliminate frame pointers which point to the base address of the function’s frame |
-menable-no-inf: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow ;optimization to assume there are no infinities. |
-menable-no-nans: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow ;optimization to assume there are no NaNs. |
-menable-unsafe-fp-math: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow unsafe floating-point math optimizations which may decrease precision |
-fno-signed-zeros: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow optimizations for floating point arithmetic that ignore the signedness of zero |
-mreassociate: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow reassociation transformations for floating-point instructions |
-freciprocal-math: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow division operations to be reassociated |
-fdenormal-fp-math=preserve-sign,preserve-sign: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Select which denormal numbers the code is permitted to require. |
-ffp-contract=fast: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Form fused FP ops (e.g. FMAs): fast (everywhere) OR on (according to FP_CONTRACT pragma, ;default) OR off (never fuse) |
-fmath-errno: |
✅ | ✅ | ✅ | ✅ | ❌ | ✅ | ✅ | Require math functions to indicate errors by setting errno |
-fno-rounding-math: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Force floating-point operations to honor the dynamically-set rounding mode by default. |
-ffast-math: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math |
-ffinite-math-only: |
❌ | ❌ | ❌ | ❌ | ✅ | ❌ | ❌ | Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf. |
-mconstructor-aliases: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | enable constructor aliases |
-munwind-tables: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Generate unwinding tables for all functions |
-target-cpu x86-64: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Target a specific cpu type |
-fno-split-dwarf-inlining: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Provide minimal debug info in the object/executable to facilitate online symbolication/stack traces in the absence of .dwo/.dwp files when using Split DWARF |
-debugger-tuning=gdb: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | tune the debug info |
-internal-isystem: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
-internal-externc-isystem: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | |
-resource-dir: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The directory which holds the compiler resource files |
-fdebug-compilation-dir: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | The compilation directory to embed in the debug info. |
-ferror-limit 19: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Set the maximum number of errors to emit before stopping (0 = no limit). |
-fgnuc-version=4.2.1: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Sets various macros to claim compatibility with the given GCC version |
-fcolor-diagnostics: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Use colors in diagnostics |
-faddrsig: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ | Emit an address-significance table |
-vectorize-loops: |
❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ❌ | Run the Loop vectorization passes |
-vectorize-slp: |
❌ | ❌ | ✅ | ✅ | ✅ | ✅ | ✅ | Run the SLP vectorization passes |
-main-file-name: |
✅ | ✅ | ✅ | ✅ | ✅ | ✅ | ✅ |
We can see the main differences of optimization parameters between the two versions are the introduction of cpu tune option and relax relocations.
Comments