Fast integer overflow detection
Writing efficient overflow checks in C is very challenging (but good brain exercise). Even libraries written by security experts, such as Microsoft’s SafeInt and CERT’s IntegerLib, could contain broken overflow checks.
For another example, I have seen a lot of bit tricks for detecting
int
+ int
overflow, many of which are actually questionable
because they rely on undefined behavior. At C-language level one
cannot compute the signed addition first before overflow checking
since signed integer overflow is undefined. Guess what C compilers
would do to the following code.
int sum(int a, int b)
{
int c;
assert(a >= 0);
assert(b >= 0);
c = a + b; /* both a and b are non-negative */
if (c < 0) /* overflow? */
abort();
return c;
}
GCC will optimize away the check c < 0
, while Clang keeps
it. You may try another (broken) form c < a
:
this time GCC keeps the check, while Clang removes it.
I am proposing an overflow detection library, libo. Unlike previous ones, it consists of assembly code, and due to the fact that I am lazy, the assembly code is generated automatically from a short program.
Let’s start with a simple task: writing an array allocator
malloc_array(n, size)
, where n
is the number of elements
and size
is the element size
(i.e., the non-zeroing version of calloc
).
Here’s a popular way.
void *malloc_array(size_t n, size_t size)
{
if (size && n > SIZE_MAX / size)
return NULL;
return malloc(n * size);
}
Unfortunately neither GCC nor Clang is able to optimize away the division (dunno why though; looks like a straightforward transformation).
malloc_array: # @malloc_array
.cfi_startproc
# BB#0: # %entry
testq %rsi, %rsi
je .LBB0_3
# BB#1: # %land.rhs
movq $-1, %rax
xorl %edx, %edx
divq %rsi
cmpq %rdi, %rax
jae .LBB0_3
# BB#2: # %return
xorl %eax, %eax
ret
.LBB0_3: # %if.end
imulq %rdi, %rsi
movq %rsi, %rdi
jmp malloc # TAILCALL
.Ltmp0:
.size malloc_array, .Ltmp0-malloc_array
.cfi_endproc
If you are crazy about performance, try this trick from the Boehm GC guys.
#define SQRT_SIZE_MAX ((1U << (sizeof(size_t) * 8 / 2)) - 1)
if ((size | n) > SQRT_SIZE_MAX /* fast test */
&& size && n > SIZE_MAX / size)
return NULL;
Another way is to promote the integer operation to a wider type,
such as 128-bit (assuming size_t
is 64-bit); just don’t
forget to do the type cast before multiplication.
I am not sure how portable this is. The
grsecurity patch uses something similar
to modify kmalloc
.
void *malloc_array(size_t n, size_t size)
{
__uint128_t bytes = (__uint128_t)n * (__uint128_t)bytes;
if (bytes > SIZE_MAX)
return NULL;
return malloc((size_t)bytes);
}
With the new libo, malloc_array
could be implemented as follows.
void *malloc_array(size_t n, size_t size)
{
size_t bytes;
if (umuloz(n, size, &bytes))
return NULL;
return malloc(bytes);
}
With link-time optimization, Clang produces very nice code:
compared to the baseline code (without overflow checking),
it has only one more jno
.
malloc_array: # @malloc_array
.cfi_startproc
# BB#0: # %entry
movq %rdi, %rax
mulq %rsi
jno .LBB0_2
# BB#1: # %return
xorl %eax, %eax
ret
.LBB0_2: # %if.end
movq %rax, %rdi
jmp malloc # TAILCALL
.Ltmp0:
.size malloc_array, .Ltmp0-malloc_array
.cfi_endproc
Here is the trick. LLVM internally supports arithmetic with overflow operations, based on which the libo generator builds up functions like this:
define i32 @umulo64(i64, i64, i64*) alwaysinline {
entry:
%3 = call { i64, i1 } @llvm.umul.with.overflow.i64(i64 %0, i64 %1)
%4 = extractvalue { i64, i1 } %3, 0
store i64 %4, i64* %2
%5 = extractvalue { i64, i1 } %3, 1
%6 = sext i1 %5 to i32
ret i32 %6
}
To use libo with Clang (and link-time optimization), one can just use the IR. To use libo with GCC, one can invoke LLVM’s backends to generate assembly. Assuming LLVM is implemented correctly, we get a correct and efficient libo implementation for all architectures LLVM supports.
One downside is that it may be difficult to do link-time optimization
when using GCC; this would end up with having a call to umulo64
,
rather than inlining it. How to “inline” a function purely written
in assembly code using GCC? Or is it possible to have LLVM generate
inline assembly instead? Of course it would be nicer if compilers
provide built-in functions for overflow detection.