It is not easy to find documentation regarding Clang’s optimization options, especially for newer versions of LLVM. So I did a manual statistics and organized them in a table. This article and the table was inspired&forked from lolo32, many thanks!
Currently, there are two versions of Clang co-existing my Intel NUC 10, Clang 12.0.1(installed through source code) and Clang 11.0.0(installed through yum). The OS information is as the title. Version 1:

1    Clang version 12.0.1
2    Target: x86_64-unknown-linux gnu
3    Thread model: posix

This was made with commands:

1    echo 'int;' | clang++-12 -xc -O0    - -o /dev/null -\#\#\#
2    echo 'int;' | clang++-12 -xc -O1    - -o /dev/null -\#\#\#
3    echo 'int;' | clang++-12 -xc -O2    - -o /dev/null -\#\#\#
4    echo 'int;' | clang++-12 -xc -O3    - -o /dev/null -\#\#\#
5    echo 'int;' | clang++-12 -xc -Ofast - -o /dev/null -\#\#\#
6    echo 'int;' | clang++-12 -xc -Os    - -o /dev/null -\#\#\#
7    echo 'int;' | clang++-12 -xc -Oz    - -o /dev/null -\#\#\#
  • -O0 means “no optimization”: this level compiles the fastest and generates the most debuggable code. It enable -mrelax-all option.
  • -O1 somewhere between -O0 and -O2.
  • -O2 moderate level of optimization which enables most optimizations.
  • -O3 is like -O2 except that it enables optimizations that take longer to perform or that may generate larger code (in an attempt to make the program run faster).
  • -Ofast enable -O3, with other aggressive optimizations that may violate strict compliance with language standards. It speedups math calculations. Including 1. Floating-point math obeys regular algebraic rules for real numbers (e.g. + and * are associative, x/y == x * (1/y), and (a + b) * c == a * c + b * c) 2. Operands to floating-point operations are not equal to NaN and Inf, and 3. +0 and-0 are interchangeable. -ffast-math also defines the __FAST_MATH__ preprocessor macro. Some math libraries recognize this macro and change their behavior. With the exception of -ffp-contract=fast, using any of the options below to disable any of the individual optimizations in -ffast-math will cause __FAST_MATH__ to no longer be set.
  • -Os is like -O2 with extra optimizations to reduce code size.
  • -Oz is like -Os, but try to minimize even more the code size.

Below are the tables,

Option -O0 -O1 -O2 -O3 -Ofast -Os -Oz Description
-cc1: the frontend
-triple x86_64-unknown-linux-gnu Specify target triple(architecture)
-emit-obj: Emit native object files
-mrelax-all: (integrated) relax all machine instructions
--mrelax-relocations: These options control whether the assembler should generate relax relocations
-disable-free: Disable freeing of memory on exit
-mrelocation-model static: The relocation model to use (what is static?)
-mframe-pointer=all: keep frame pointers
-mframe-pointer=none: ❌: eliminate frame pointers which point to the base address of the function’s frame
-menable-no-inf: Allow ;optimization to assume there are no infinities.
-menable-no-nans: Allow ;optimization to assume there are no NaNs.
-menable-unsafe-fp-math: Allow unsafe floating-point math optimizations which may decrease precision
-fno-signed-zeros: Allow optimizations for floating point arithmetic that ignore the signedness of zero
-mreassociate: Allow reassociation transformations for floating-point instructions
-freciprocal-math: Allow division operations to be reassociated
-fdenormal-fp-math=preserve-sign,preserve-sign: Select which denormal numbers the code is permitted to require.
-ffp-contract=fast: Form fused FP ops (e.g. FMAs): fast (everywhere) OR on (according to FP_CONTRACT pragma, ;default) OR off (never fuse)
-fmath-errno: Require math functions to indicate errors by setting errno
-fno-rounding-math: Force floating-point operations to honor the dynamically-set rounding mode by default.
-ffast-math: Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math
-ffinite-math-only: Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf.
-mconstructor-aliases: enable constructor aliases
-munwind-tables: Generate unwinding tables for all functions
-target-cpu x86-64: Target a specific cpu type
-tune-cpu generic: tells the compiler to emit instructions for some (probably ancient, like generic x86-64) CPU, but schedule (order) the instructions for a (probably more common, like a broadwell or a znver2) one. Same as -mtune on GCC
-fno-split-dwarf-inlining: Provide minimal debug info in the object/executable to facilitate online symbolication/stack traces in the absence of .dwo/.dwp files when using Split DWARF
-debugger-tuning=gdb: tune the debug info
-internal-isystem:
-internal-externc-isystem:
-resource-dir: The directory which holds the compiler resource files
-fdebug-compilation-dir: The compilation directory to embed in the debug info.
-ferror-limit 19: Set the maximum number of errors to emit before stopping (0 = no limit).
-fgnuc-version=4.2.1: Sets various macros to claim compatibility with the given GCC version
-fcolor-diagnostics: Use colors in diagnostics
-faddrsig: Emit an address-significance table
-vectorize-loops: Run the Loop vectorization passes
-vectorize-slp: Run the SLP vectorization passes
-main-file-name:

And Version 2:

1    Clang version 11.0.0(Red Hat 11.0.0-1.module+el8.4.0)
2    Target: x86_64-unknown-linux gnu
3    Thread model: posix

With the same commands except replacing clang++-12 with clang++ for each.

Option -O0 -O1 -O2 -O3 -Ofast -Os -Oz Description
-cc1: the frontend
-triple x86_64-unknown-linux-gnu Specify target triple(architecture)
-emit-obj: Emit native object files
-mrelax-all: (integrated) relax all machine instructions
-disable-free: Disable freeing of memory on exit
-disable-llvm-verifier: Don’t run the LLVM IR verifier pass
-discard-value-names: Discard value names when generating LLVM IR
-mrelocation-model static: The relocation model to use (what is static?)
-mframe-pointer=all: keep frame pointers
-mframe-pointer=none: ❌: eliminate frame pointers which point to the base address of the function’s frame
-menable-no-inf: Allow ;optimization to assume there are no infinities.
-menable-no-nans: Allow ;optimization to assume there are no NaNs.
-menable-unsafe-fp-math: Allow unsafe floating-point math optimizations which may decrease precision
-fno-signed-zeros: Allow optimizations for floating point arithmetic that ignore the signedness of zero
-mreassociate: Allow reassociation transformations for floating-point instructions
-freciprocal-math: Allow division operations to be reassociated
-fdenormal-fp-math=preserve-sign,preserve-sign: Select which denormal numbers the code is permitted to require.
-ffp-contract=fast: Form fused FP ops (e.g. FMAs): fast (everywhere) OR on (according to FP_CONTRACT pragma, ;default) OR off (never fuse)
-fmath-errno: Require math functions to indicate errors by setting errno
-fno-rounding-math: Force floating-point operations to honor the dynamically-set rounding mode by default.
-ffast-math: Enable fast-math mode. This option lets the compiler make aggressive, potentially-lossy assumptions about floating-point math
-ffinite-math-only: Allow floating-point optimizations that assume arguments and results are not NaNs or +-Inf.
-mconstructor-aliases: enable constructor aliases
-munwind-tables: Generate unwinding tables for all functions
-target-cpu x86-64: Target a specific cpu type
-fno-split-dwarf-inlining: Provide minimal debug info in the object/executable to facilitate online symbolication/stack traces in the absence of .dwo/.dwp files when using Split DWARF
-debugger-tuning=gdb: tune the debug info
-internal-isystem:
-internal-externc-isystem:
-resource-dir: The directory which holds the compiler resource files
-fdebug-compilation-dir: The compilation directory to embed in the debug info.
-ferror-limit 19: Set the maximum number of errors to emit before stopping (0 = no limit).
-fgnuc-version=4.2.1: Sets various macros to claim compatibility with the given GCC version
-fcolor-diagnostics: Use colors in diagnostics
-faddrsig: Emit an address-significance table
-vectorize-loops: Run the Loop vectorization passes
-vectorize-slp: Run the SLP vectorization passes
-main-file-name:

We can see the main differences of optimization parameters between the two versions are the introduction of cpu tune option and relax relocations.