Stare at the C code below and guess what could go wrong. It’s originally from a CPU simulator, implementing a 16×16⇒32 unsigned multiplication. Thanks to Mattias Engdegård at Intel for sharing the story.
1 2 3 4 5
To be more specific, let’s assume x86-64.
Is it possible that for some
the result of
mul(a, b) changes when compiling the code with
different optimization levels using gcc?
Spoiler alert: here’s what will happen.
When compiling the code with
gcc -O0, you’ll get:
$ ./mul 65535 65535 65535 * 65535 = 4294836225 (0xfffe0001)
Hope you’re not expecting the multiplication to overflow and output
1 here, which would be the case in Go but not in C/C++.
We’ll come back to that later.
gcc -O2, you’ll get this:
$ ./mul 65535 65535 65535 * 65535 = 18446744073709420545 (0xfffffffffffe0001)
This doesn’t look pretty, does it?
And here’s what gcc 4.7.3/4.8.1 actually emits with
1 2 3 4 5
Wait, the C code uses unsigned integers only.
What’s this sign-extension instruction
cdqe in Intel/AMD manuals) doing here?
Is this a gcc bug?
even though emitting
cltq is sort of creepy,
it doesn’t violate the C standard.
There’s a very subtle bug in the
I’ll show how gcc emits
cltq by exploiting two odd C
rules, integer promotions and undefined behavior, and discuss
possible ways to avoid/detect such problems.
How gcc emits cltq
To see what gcc does, invoke it with
-fdump-tree-all to dump the
IR after each pass.
Here’s the (simplified) output after vrp (value range propagation).
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
First of all,
b are of type
which is smaller than
int (assuming 32-bit
According to the rule of integer promotions (C11, 22.214.171.124/2),
they are converted to
int for multiplication.
In other words,
a * b is actually
(int)a * (int)b,
a signed multiplication.
Second, since signed integer overflow is undefined behavior in C,
gcc assumes the signed multiplication doesn’t overflow. This is shown
in the range information: the product
_5 is considered non-negative.
_5 are indistinguishable.
Now the sign conversion
c = (uint32_t) _5 becomes dead code, and gcc’s next
pass dce (dead code elimination) removes it.
The return value
_7 is basically
(uint64_t)((int)a * (int)b).
Note that this is a sign extension from
which will be lowered to
This is how
cltq pops out.
Fixes and workarounds
One way to fix the C code is to convert
int is 32-bits).
1 2 3 4 5
Instead of fixing the code, another approach is to add a workaround
compiler option. Here’s what gcc 4.7.3/4.8.1 emits with
1 2 3 4
Everything looks good now. However, if you use
-fwrapv, congratulations, you’ll get bitten by
1 2 3 4 5
As a comparison, here’s what clang 3.3 emits with
1 2 3
Don’t worry, no
Possible detection methods
In our STACK paper,
we use the term unstable code to refer to
program fragments being optimized away by the compiler due to undefined behavior.
STACK is a static checker
for detecting unstable code.
While it works well for other cases,
unfortunately, STACK doesn’t work here.
The main reason is that STACK accepts LLVM IR,
which has no sign conversion instruction;
the code being optimized away,
the sign conversion
c = (uint32_t) _5 as in gcc,
doesn’t exist in LLVM IR:
1 2 3 4 5 6 7 8
Since in LLVM IR there’s no instruction to be optimized away, STACK doesn’t report any warning. One could port STACK to gcc to catch this case though.
Another possible way is to extend STACK with an oracle that rewrites
zext (zero extension) to
sext (sign extension).
The idea is that
if the code after rewriting is equivalent to the original only
under the assumption of undefined behavior,
then this rewriting exploits undefined behavior,
and the code may be unstable.
I coded up a prototype and it did work for this
sign branch in STACK’s git repository if interested).
But it reported too many false positives.
After all, the compiler being able to remove code often indicates a bug,
while being able to flip between
Maybe we need better heuristics to reduce false positives.
There are a few handy tools for finding undefined behavior. For example, run the C code with Frama-C’s value analysis and you’ll see the following warning:
$ frama-c -val mul.c [value] Analyzing a complete application starting at main ... mul.c:3:[kernel] warning: signed overflow. assert (int)a*(int)b ≤ 2147483647; mul.c:3:[value] assigning non deterministic value for the first time ...
You can also compile it with clang’s
You need an input to trigger the undefined behavior, such as:
$ ./mul 65535 65535 mul.c:3:17: runtime error: signed integer overflow : 65535 * 65535 cannot be represented in type 'int'
Though these tools won’t tell you what code may be “miscompiled” as STACK does, fixing the warnings will reduce the chances of being bitten by the compiler.
This is one of my favorite unstable code examples. The code is very short, uses unsigned integers only, and yet confusing. Fixing the problem, especially by using gcc’s workaround options, is tricky.
Update from Sep 18th:
Add results from Frama-C and clang’s
Thanks to David Mentré and John Regehr.