CVE-2012-2100: A fix to fix a fix in ext4

This is not a serious security bug, but interesting.

The story started with an email from an IBM engineer to the ext4 mailing list back in 2009.

Hi,

Found kernel bug - fixpoint divide exception
While working with fsfuzzer

Environment: 2.6.31-rc7
Architecture: s390 and ppc64

------------[ cut here ]------------ 
...
Kernel BUG at 000003e00429d934 [verbose debug info unavailable] 

fixpoint divide exception: 0009 [#1] SMP 
...
Call Trace:
([<000003e00429d87c>] ext4_fill_super+0x13c0/0x2908 [ext4])
  [<00000000000f46f2>] get_sb_bdev+0x13e/0x19c
  [<000003e00429230e>] ext4_get_sb+0x2e/0x40 [ext4]
  [<00000000000f3f98>] vfs_kern_mount+0xc0/0x168
  [<00000000000f40ac>] do_kern_mount+0x58/0x114
  [<000000000010e558>] do_mount+0x798/0x830
  [<000000000010e6a0>] SyS_mount+0xb0/0x100
  [<00000000000266be>] sysc_noemu+0x10/0x16
  [<0000004e53f234e2>] 0x4e53f234e2
Last Breaking-Event-Address:
  [<000003e00429d8d8>] ext4_fill_super+0x141c/0x2908 [ext4]

---[ end trace b5563edf9c0c9b52 ]---
...

The call trace shows that a division by zero occurred when mounting an ext4 filesystem. Note that the bug was triggered using fsfuzzer on s390 and ppc64. Why does this matter? Let’s take a look at the code.

if (!sbi->s_es->s_log_groups_per_flex) {
	sbi->s_log_groups_per_flex = 0;
	return 1;
}

sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
groups_per_flex = 1 << sbi->s_log_groups_per_flex;

/* We allocate both existing and potentially added groups */
flex_group_count = ... / groups_per_flex;
size = flex_group_count * sizeof(struct flex_groups);
sbi->s_flex_groups = ext4_kvzalloc(size, GFP_KERNEL);

Since s_log_groups_per_flex is read from disk, it can be anything. The only existing check is to make sure it’s non-zero. Let s_log_groups_per_flex be 32 then. Now groups_per_flex becomes 1 << 32 = 0, and traps the division-by-zero exception when it’s used as a divisor later (see CVE-2009-4307).

Recall that the bug was found using fsfuzzer on s390 and ppc64. Actually, if fsfuzzer were running on x86, the division by zero wouldn’t have been triggered at all, because on x86 1 << 32 evaluates to 1, rather than 0. The reason is that s390 and ppc use 6 bits for the shifting amount, while x86 uses 5.

Division by zero is easy to fix, just to add a zero check groups_per_flex == 0, isn’t it? That’s exactly the initial patch from the IBM engineer who discovered the bug.

The ext4 developer changed the patch a little bit in the fix.

-       if (!sbi->s_es->s_log_groups_per_flex) {
+       sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
+       groups_per_flex = 1 << sbi->s_log_groups_per_flex;
+
+       if (groups_per_flex < 2) {
                sbi->s_log_groups_per_flex = 0;
                return 1;
        }
 
-       sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
-       groups_per_flex = 1 << sbi->s_log_groups_per_flex;
-

This patch basically combines

the existing zero check on s_log_groups_per_flex (that is, groups_per_flex == 1), and
the newly proposed zero check groups_per_flex == 0

into a single check groups_per_flex < 2.

Let’s examine the fixes. First, what if s_log_groups_per_flex is 36? groups_per_flex (1 << 36) is 0 on ppc, while it’s 16 on x86, which will bypass the check. This causes a bizarre inconsistency not only across architectures, but also between s_log_groups_per_flex and groups_per_flex—logically, s_log_groups_per_flex is expected to be the log of groups_per_flex.

A subtler part is that shifting an n-bit integer by n or more bits is undefined in C. That means a compiler can make aggressive assumptions, such as s_log_groups_per_flex < 32, which further implies that groups_per_flex will never be zero. Thus, in the initial patch, the zero check groups_per_flex == 0 would be optimized away by a standard-conforming compiler (at least Clang does so).

Fortunately, the check was changed to groups_per_flex < 2, which hinders the optimization and fixes the division by zero. Of course, future compilers may still break that, for example, by rewriting the check to groups_per_flex == 1.

To sum up, a better fix is to check s_log_groups_per_flex rather than groups_per_flex, so as to avoid the undefined behavior.

BTW, we might need a tighter limit on s_log_groups_per_flex to avoid overflowing some allocation size.