# Xi Wang

## alias cd='rm -rf' # IDIV DoS Permalink A little fun for the last day of 2012: how to crash a program via division? x86’s IDIV instruction traps not only on division by zero, but also on INT_MIN / -1 (signed integer overflow). ## C/C++ Try to compile this little function. It will crash the tcc compiler and the sparse checker. ## SQL You need a 64-bit PostgreSQL (older than 9.2.2/9.1.7/9.0.11) installed on Windows. Try the following SQL statement. Instead of producing 0 from modulo -1, it will crash your PostgreSQL server. The modulo trick also worked in Firebird. This is fixed by adding a check on the divisor. Here goes more evil SQL. The first two are straightforward. The third one (multiplication) crashes PostgreSQL because the overflow check is done via division. The fix is simple: do the overflow check before the division, not after. It’s interesting that the developers fixed 32-bit division crashes, but missed 64-bit cases. My guess is that the developers did the tests on a 32-bit Windows. In that case, since there’s no 64-bit IDIV instruction, the compiler instead generates a call to a runtime function lldiv, which doesn’t trap on INT_MIN / -1. This would lead to the incorrect conclusion that 64-bit division wouldn’t trap. ## ClamAV bytecode The ClamAV engine accepts bytecode signatures as extensions to detect new viruses. You can write one to crash ClamAV’s interpreter, as follows. It basically does INT_MIN / -1, only to prevent ClamAV’s bytecode compiler from optimizing away the division. Then compile the function into bytecode, load it into clamscan, and it will crash the interpreter. Actually the interpreter does have a sanity check for signed division, which was introduced in 2009. The sanity check doesn’t work because, well, you really should swap op0 and op1 in the second half of the check. Let me know if you have more stories. Happy new year! # More Randomness or Less Permalink CVE-2008-0166 is an infamous example of using uninitialized memory for random number generation. A Debian maintainer commented out two lines of code to silence Valgrind, which complained about the uses of uninitialized memory as a bonus’’ source of entropy, and this change caused OpenSSL to generate bogus keys on Debian-based systems. Actually, using uninitialized memory has always been a very bad idea, not only because it confuses developers and tools like Valgrind, but because a smart C compiler will kick you in the teeth. Here is one example in FreeBSD and Mac OS X libc. The srandomdev() function is used to seed random(), according to its manpage, suitable for cryptographic use.’’ It will first try /dev/random, which is non-blocking on FreeBSD and Mac OS X; if that fails, it falls back to using current time and pid, with some amazing extra bits. You can see that the seed is computed using results from gettimeofday() and getpid(), mixed with the value of an uninitialized stack variable junk. Here’s the corresponding assembly code from Mac OS X 10.6 (Snow Leopard). Everything looks good so far. Now let’s look at the same code from 10.7 (Lion) and 10.8 (Mountain Lion). Wait, the entire seed computation is gone! The results of gettimeofday() and getpid() are not used at all; srandom() is called with some garbage’’ value. I guess Apple has switched from GCC to LLVM for compiling libc in newer Mac OS X. Since the C code contains undefined behavior, the use of uninitialized variable junk, LLVM optimizes away the computation more aggressively. You should see the same assembly if compiling FreeBSD using LLVM. There are several interesting questions to explore. • Is it possible to trigger this by somehow failing /dev/random on FreeBSD and Mac OS X? • How random is the srandom() seed now compiled using LLVM, which is just the value of %edi? Looks like it’s the lower 32 bits of the address of the stack variable tv. • Does GCC generate correct’’ code? Probably not. The last xorl instruction uses %ebx, which is unlikely to correspond to the value of junk but the file descriptor of /dev/random. In short, just don’t use uninitialized memory for more randomness. The use of junk was introduced in 1997. Google shows a bunch of similar usages and fixes. # CVE-2012-2100: A Fix to Fix a Fix in Ext4 Permalink This is not a serious security bug, but funny. The story started with an email from an IBM engineer to the ext4 mailing list back in 2009. Hi, Found kernel bug - fixpoint divide exception While working with fsfuzzer Environment: 2.6.31-rc7 Architecture: s390 and ppc64 ------------[ cut here ]------------ ... Kernel BUG at 000003e00429d934 [verbose debug info unavailable] fixpoint divide exception: 0009 [#1] SMP ... Call Trace: ([<000003e00429d87c>] ext4_fill_super+0x13c0/0x2908 [ext4]) [<00000000000f46f2>] get_sb_bdev+0x13e/0x19c [<000003e00429230e>] ext4_get_sb+0x2e/0x40 [ext4] [<00000000000f3f98>] vfs_kern_mount+0xc0/0x168 [<00000000000f40ac>] do_kern_mount+0x58/0x114 [<000000000010e558>] do_mount+0x798/0x830 [<000000000010e6a0>] SyS_mount+0xb0/0x100 [<00000000000266be>] sysc_noemu+0x10/0x16 [<0000004e53f234e2>] 0x4e53f234e2 Last Breaking-Event-Address: [<000003e00429d8d8>] ext4_fill_super+0x141c/0x2908 [ext4] ---[ end trace b5563edf9c0c9b52 ]--- ... The call trace shows that a division by zero occurred when mounting an ext4 filesystem. Note that the bug was triggered using fsfuzzer on s390 and ppc64. Why does this matter? Let’s take a look at the code. Since s_log_groups_per_flex is read from disk, it can be anything. The only existing check is to make sure it’s non-zero. Let s_log_groups_per_flex be 32 then. Now groups_per_flex becomes 1 << 32 = 0, and traps the division-by-zero exception when it’s used as a divisor later (see CVE-2009-4307). Recall that the bug was found using fsfuzzer on s390 and ppc64. Actually, if fsfuzzer were running on x86, the division by zero wouldn’t have been triggered at all, because on x86 1 << 32 evaluates to 1, rather than 0. The reason is that s390 and ppc use 6 bits for the shifting amount, while x86 uses 5. Division by zero is easy to fix, just to add a zero check groups_per_flex == 0, isn’t it? That’s exactly the initial patch from the IBM engineer who discovered the bug. The ext4 developer changed the patch a little bit in the fix. This patch basically combines • the existing zero check on s_log_groups_per_flex (that is, groups_per_flex == 1), and • the newly proposed zero check groups_per_flex == 0 into a single check groups_per_flex < 2. Let’s examine the fixes. First, what if s_log_groups_per_flex is 36? groups_per_flex (1 << 36) is 0 on ppc, while it’s 16 on x86, which will bypass the check. This causes a bizarre inconsistency not only across architectures, but also between s_log_groups_per_flex and groups_per_flex — logically, s_log_groups_per_flex is expected to be the log of groups_per_flex. A more subtle part is that shifting an n-bit integer by n or more bits is undefined in C. That means a compiler can make aggressive assumptions, such as s_log_groups_per_flex < 32, which further implies that groups_per_flex will never be zero. Thus, in the initial patch, the zero check groups_per_flex == 0 would be optimized away by a standard-conforming compiler (at least Clang does so). Fortunately, the check was changed to groups_per_flex < 2, which hinders the optimization and fixes the division by zero. Of course, future compilers may still break that, for example, by rewriting the check to groups_per_flex == 1. To sum up, a better fix is to check s_log_groups_per_flex rather than groups_per_flex, so as to avoid the undefined behavior. BTW, we might need a tighter limit on s_log_groups_per_flex to avoid overflowing some allocation size. # Libo 0.1 Released Permalink Just tagged libo 0.1, a library for fast integer overflow detection. The major interface change in this version is that instead of writing smulo32(...) for detecting signed 32-bit multiplication overflow, you just call overflow_mul(...), and it will figure out the type for you (using black magic). I have also added a unit test using Google Test. It would be great if you can help generate and test ARM/PowerPC implementations. To give you a sense of how efficient libo is, below is libo’s implementation of overflow_mul() for long on x86_64: on ppc: For a comparison, this is the implementation from CERT’s IntegerLib. # Overflow Builtins Permalink Since it’s non-trivial to do integer overflow checking correctly and efficiently (see my email to cfe-dev and previous post on libo), let’s try compiler support. Just gave it a shot in Clang. My patch is available on the builtin-overflow branch. It introduces bool __overflow_*(T *, T, T) builtin functions, which are easier to understand, less error-prone, and have better performance (e.g., only one more jno instruction on x86 for most cases). Here is an example. One more example: signed addition overflow detection mentioned in the previous post. Below is the implementation from CERT’s IntegerLib. Anything suspicious? Despite the clever bit trick, the code is undefined because signed overflow can happen before the check. It also doesn’t work on 64-bit platform: sizeof(int) should be sizeof(long). But now you can simply write __overflow_sadd, without worrying about undefined behavior nor performance. Interested in overflow builtins, found any bugs, or got better ideas about the interface and function names? Leave your comments or join the discussions at bugzilla and cfe-dev. # Fast Integer Overflow Detection Permalink Writing efficient overflow checks in C is very challenging (but good brain exercise). Even libraries written by security experts, such as Microsoft’s SafeInt and CERT’s IntegerLib, could contain broken overflow checks. For another example, I have seen a lot of bit tricks for detecting int + int overflow, many of which are actually questionable because they rely on undefined behavior. At C-language level one cannot compute the signed addition first before overflow checking since signed integer overflow is undefined. Guess what C compilers would do to the following code. GCC will optimize away the check c < 0, while Clang keeps it. You may try another (broken) form c < a: this time GCC keeps the check, while Clang removes it. I am proposing an overflow detection library, libo. Unlike previous ones, it consists of assembly code, and due to the fact that I am lazy, the assembly code is generated automatically from a short program. Let’s start with a simple task: writing an array allocator malloc_array(n, size), where n is the number of elements and size is the element size (i.e., the non-zeroing version of calloc). Here’s a popular way. Unfortunately neither GCC nor Clang is able to optimize away the division (dunno why though; looks like a straightforward transformation). If you are crazy about performance, try this trick from the Boehm GC guys. Another way is to promote the integer operation to a wider type, such as 128-bit (assuming size_t is 64-bit); just don’t forget to do the type cast before multiplication. I am not sure how portable this is. The grsecurity patch uses something similar to modify kmalloc. With the new libo, malloc_array could be implemented as follows. With link-time optimization, Clang produces very nice code: compared to the baseline code (without overflow checking), it has only one more jno. Here is the trick. LLVM internally supports arithmetic with overflow operations, based on which the libo generator builds up functions like this: To use libo with Clang (and link-time optimization), one can just use the IR. To use libo with GCC, one can invoke LLVM’s backends to generate assembly. Assuming LLVM is implemented correctly, we get a correct and efficient libo implementation for all architectures LLVM supports. One downside is that it may be difficult to do link-time optimization when using GCC; this would end up with having a call to umulo64, rather than inlining it. How to inline’’ a function purely written in assembly code using GCC? Or is it possible to have LLVM generate inline assembly instead? Of course it would be nicer if compilers provide built-in functions for overflow detection. # Memory Allocator Security Revisited Permalink C/C++ 101: • What does malloc(n) return if n is a big number (e.g., -1 or SIZE_MAX)? • What does calloc(n, size) return if n and size are big numbers? • What does new T[n] return if n is a big number (assuming -fno-exception)? I believe everyone expects the answer NULL for all the questions, as NULL is used to indicate out of memory. Unfortunately, in reality many memory allocators break (or used to break) these expectations, which could lead to memory corruption and security vulnerabilities, especially if the allocation size can be controlled by an adversary. Here goes a summary of some known and unknown vulnerabilities (notably multiplication, rounding, and pointer overflows) in popular memory allocators. ## GLIBC malloc An infamous multiplication overflow used to exist in the calloc implementation in GNU libc and Microsoft’s C runtime, as detailed in RUS-CERT Advisory 2002-08:02. This overflow vulnerability can be easily misused to mount buffer overflow attacks. It was fixed in GLIBC in 2002. ## Hoard I cannot find the revision log of the Hoard allocator, so it’s hard to say what vulnerabilities it used to have. Though it’s easy to exploit the current version (3.8) via a rounding overflow. Below is the buggy code snippet. When sz is (size_t)-1, a big value, the align call overflows and rounds sz up to 0. Thus, malloc(-1) is basically malloc(0), which could lead to a buffer overflow. Hoard’s calloc implementation is also vulnerable to multiplication overflow. With glibc Hoard is injected via __malloc_hook, and its calloc won’t be used. However, the vulnerability can be triggered on platforms that do not use glibc, such as Mac OS X. ## jemalloc The jemalloc allocator is used by the libc of FreeBSD and NetBSD, as well as Mozilla Firefox. It used to have the calloc multiplication overflow vulnerability. The bug was fixed in 2006. A rounding overflow bug was fixed in 2007. Another interesting signedness bug was fixed in 2008. Below is the simplified code snippet. Note that size is unsigned and incr is signed. When size is big enough, incr is misinterpreted as negative and causes the sbrk system call to shrink the memory. ## Bionic malloc Bionic, the Android libc, is derived from the BSD libc. When libc.debug.malloc is set, allocation calls will be rerouted to its own debugging library. Again, the debugging library had calloc multiplication overflows (CVE-2009-0607). Also, its malloc implementation still has multiple rounding overflows, at least in in chk_malloc and leak_malloc. Hope that no production phone ships the debugging library. ## TCMalloc Google’s TCMalloc repeated the calloc multiplication overflow bug and got it fixed in 2005. I didn’t spot any serious rounding issues in TCMalloc, but just a slip in ReportLargeAlloc: the error message says tcmalloc: large alloc 0 bytes == (nil)’’ given malloc(-1), because the size to be output overflows. ## APR pool APR pool is developed for the Apache web server. It is also adopted by other projects such as Subversion. The interface is similar to allocate-and-free allocators, but takes one extra parameter specifying which pool to allocate from. APR pool had a series of interesting pointer arithmetic fixes, listed as follows. In Aug 2002, the sanity check p + size < q was changed to size < q - p, where the two pointers p (i.e., node->first_evail) and q (i.e., node->endp) point to the start and the end of a memory block, respectively. Over a month later (Sep 2002), the check was reverted to p + size < q. One week later (Oct 2002), the check was again changed to size < q - p. After over five years (Apr 2008), the check was revised to size <= q - p with an off-by-one fix. Here p + size < q is more dangerous than size < q - p. One obvious reason is that p + size could wrap around to a small pointer given a big size, while the latter check doesn’t have the pointer overflow issue. But how about adding a wrap check p + size < p? A subtle problem is that pointer overflow (or more precisely, out-of-bounds pointer) is undefined in C, which means that the compiler can do anything to the wrap check p + size < p, such as optimizing it away (VU#162289). I tried to compile the wrap check using GCC 4.6 and Clang 3.0 on x86: it’s optimized as extracting the sign of size, without even looking at the value of p. The size rounding vulnerability was also discovered and fixed in APR pool (CVE-2009-2412) in 2009. ## boost::pool Boost Pool has the multiplication overflow issue in its ordered_malloc implementation. We can see that total_req_size may wrap around to a small number given a big n. ## Boehm GC Boehm GC is probably the most popular open-source garbage collector. It provides C allocation calls and overloads the C++ new operator. Boehm GC’s calloc implementation, in both malloc.c and its documentation, simply redirects calloc(n, size) to GC_MALLOC((n) * (size)), thus is vulnerable to multiplication overflow. It also has the size rounding vulnerability, so GC_MALLOC(-1) doesn’t return a null pointer either. ## Safe by Design Some allocators are designed to be used in safe’’ environment. For example, LLVM’s BumpPtrAllocator doesn’t consider out-of-memory or overflow issues. kLIBC is another example. ## Compiler The C++ array allocation new T[n] has a potential integer overflow vulnerability if n is not limited correctly, which could lead to a buffer overflow. This is similar to calloc(n, sizeof(T)). The code generated by GCC simply calculates the allocation size as n * sizeof(T) without any overflow checks, though GCC developers have been discussing the problem for a long time. Clang generates safe’’ code that defends against integer overflow attacks, via setting the allocation size to -1 if any multiplication overflow happens. This implies an assumption that the underlying allocator must return a null pointer given a large size. As we have seen, this assumption doesn’t hold well in practice. # Solving Doodle Fit Permalink A year ago I introduced this Doodle Fit game to my roommate Victor. We were playing it all day long. To stop the addiction, we then wrote a program to find the answers, which successfully destroyed the interest in playing the game. Got some time to rethink the problem last night. Here goes an interesting alternative to solve Doodle Fit, using 0-1 integer programming. The rule of the game is simple: fit a set of given blocks into a given shape (no rotation). For example, try to fit the three blocks into the two-square shape. This solution is pretty easy. Put the red block to the bottom-right corner and the green block to the left-most, leaving the blue block in the middle. Let’s see how to model the game using a set of constraints. We use $p_{color}(row, col)$ to denote whether to place the top-left corner of the $color$ block at $(row, col)$: 1 for yes and 0 for no. Take the red block for an example. Since it can only be placed at two cells, $(1, 1)$ or $(2, 3)$, this means that one and only one of $p_r(1, 1)$ and $p_r(2, 3)$ is 1. In other words, their sum must be 1. We have similar constraints for the other two blocks. Additionally, each cell will be covered by one and only one block. For example, the top-left cell $(1, 1)$ can be covered by either the green block placed at $(1, 1)$, or the blue block placed at $(1, 1)$; it cannot be covered by the red block. So the sum of $p_g(1, 1)$ and $p_b(1, 1)$ must be 1. We have such a constraint for each cell, as follows. That’s all we need. Feed these constraints into a solver that supports 0-1 integer programming, e.g., GLPK and lp_solve, and you can get the answer back like $p_r(2, 3) = p_g(1, 1) = p_b(1, 2) = 1$, indicating where to put these blocks. I wrote a script doodle-fit.py to automate the steps. Prepare an input file two specifying the shape and the blocks, split by ---, as follows. ## #### ## --- * ** --- * * --- * ** Then generate the constraints using the script, which writes them to two.mod in the GNU MathProg language. python doodle-fit.py < two > two.mod

Invoke a solver to get a solution. I use GLPK here. You may try to add --minisat to glpsol to solve the constraints with the SAT solver instead.

$glpsol --math two.mod -w two.sol GLPSOL: GLPK LP/MIP Solver, v4.47 Parameter(s) specified in the command line: --math two.mod -w two.sol Reading model section from two.mod... 24 lines were read Generating b0... Generating b1... Generating b2... Generating c00... Generating c01... Generating c10... Generating c11... Generating c12... Generating c13... Generating c22... Generating c23... Model has been successfully generated GLPK Integer Optimizer, v4.47 11 rows, 9 columns, 32 non-zeros 9 integer variables, all of which are binary Preprocessing... 11 rows, 9 columns, 32 non-zeros 9 integer variables, all of which are binary Scaling... A: min|aij| = 1.000e+00 max|aij| = 1.000e+00 ratio = 1.000e+00 Problem data seem to be well scaled Constructing initial basis... Size of triangular part = 8 Solving LP relaxation... GLPK Simplex Optimizer, v4.47 11 rows, 9 columns, 32 non-zeros * 0: obj = 0.000000000e+00 infeas = 0.000e+00 (3) OPTIMAL SOLUTION FOUND Integer optimization begins... + 0: mip = not found yet >= -inf (1; 0) + 0: >>>>> 0.000000000e+00 >= 0.000000000e+00 0.0% (1; 0) + 0: mip = 0.000000000e+00 >= tree is empty 0.0% (0; 1) INTEGER OPTIMAL SOLUTION FOUND Time used: 0.0 secs Memory used: 0.1 Mb (121133 bytes) Writing MIP solution to two.sol'... 22 lines were written Make sure you see something like OPTIMAL SOLUTION FOUND in the output. If the SAT solver is used, you should see SATISFIABLE. Otherwise, go back to check the input file, or send me a bug report. Finally, display the solution two.sol using the script again.$ python doodle-fit.py two.sol < two
12
1220
00

Now you can `enjoy’’ the game.

# CVE-2012-0038: XFS ACL Count Integer Overflow

This is a classical integer overflow in the Linux kernel’s XFS filesystem, which could further lead to a buffer overflow. To exploit it, an attacker needs to craft a corrupted XFS, for example, via a USB flash drive or remotely mounted filesystem.

The vulnerability seems to be introduced in commit ef14f0c1 in July 2009.

Note that count is passed into posix_acl_alloc() without any validation; it is converted from aclp->acl_cnt, which is directly read from disk.

Given a large count, the allocation size for kmalloc() would wrap to become a small integer. Back to xfs_acl_from_disk(), this would cause out-of-bounds write in the subsequent loop.

Later I realized that there’s a recent commit in XFS’s git repository that tried to fix the problem.

However the patch doesn’t really work: count is a signed integer; an attacker could feed in a negative value and bypass the sanity check. So here goes my follow-up patch.

BTW, Linux 3.2 contains only the first patch and remains vulnerable.

# Lighttpd: Say No to Digest Authentication

There are a lot of blogs/tutorials/notes on how to set up digest access authentication in lighttpd. Well, one thing they may miss is that it is just broken for several years.

Digest access authentication is specified by RFC 2617. It is sort of between SSL and the basic authentication — lightweight, easy to set up, while resistant to replay attacks.

It works as follows. When the client first requests a protected resource, the server responds with 401 Unauthorized, as well as realm and nonce. Then the client needs to send a second request with the response value for authentication, which is computed from password, the nonce, etc., as follows.

Here nc is short for nonce count and cnonce is client-side nonce, both of which are optional.

An attacker that is eavesdropping the traffic cannot guess the password, nor can the attacker replay the request to the server since the nonce value would be different each time.

The problem with lighttpd’s mod_auth is that it doesn’t verify the nonce in the client’s second request is the one it generated earlier, which means that an attacker could use any nonce to bypass the digest authentication. Thus, the digest authentication is basically equivalent to the basic authentication in this case.

The lighttpd developers pointed me to their wiki.

The implementation of digest method is currently not completely
compliant with the standard as it still allows a replay attack.
(i.e. not secure)

Seriously, do you really have to teach people how to configure digest authentication but hide the sentence in a bunch of small bullets? ;-)