CVE-2012-2100: A fix to fix a fix in ext4
This is not a serious security bug, but interesting.
The story started with an email from an IBM engineer to the ext4 mailing list back in 2009.
Hi,
Found kernel bug - fixpoint divide exception
While working with fsfuzzer
Environment: 2.6.31-rc7
Architecture: s390 and ppc64
------------[ cut here ]------------
...
Kernel BUG at 000003e00429d934 [verbose debug info unavailable]
fixpoint divide exception: 0009 [#1] SMP
...
Call Trace:
([<000003e00429d87c>] ext4_fill_super+0x13c0/0x2908 [ext4])
[<00000000000f46f2>] get_sb_bdev+0x13e/0x19c
[<000003e00429230e>] ext4_get_sb+0x2e/0x40 [ext4]
[<00000000000f3f98>] vfs_kern_mount+0xc0/0x168
[<00000000000f40ac>] do_kern_mount+0x58/0x114
[<000000000010e558>] do_mount+0x798/0x830
[<000000000010e6a0>] SyS_mount+0xb0/0x100
[<00000000000266be>] sysc_noemu+0x10/0x16
[<0000004e53f234e2>] 0x4e53f234e2
Last Breaking-Event-Address:
[<000003e00429d8d8>] ext4_fill_super+0x141c/0x2908 [ext4]
---[ end trace b5563edf9c0c9b52 ]---
...
The call trace shows that a division by zero occurred when mounting an ext4 filesystem. Note that the bug was triggered using fsfuzzer on s390 and ppc64. Why does this matter? Let’s take a look at the code.
if (!sbi->s_es->s_log_groups_per_flex) {
sbi->s_log_groups_per_flex = 0;
return 1;
}
sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
groups_per_flex = 1 << sbi->s_log_groups_per_flex;
/* We allocate both existing and potentially added groups */
flex_group_count = ... / groups_per_flex;
size = flex_group_count * sizeof(struct flex_groups);
sbi->s_flex_groups = ext4_kvzalloc(size, GFP_KERNEL);
Since s_log_groups_per_flex
is read from disk, it can be anything.
The only existing check is to make sure it’s non-zero. Let
s_log_groups_per_flex
be 32 then. Now groups_per_flex
becomes
1 << 32 = 0
, and traps the division-by-zero exception when it’s
used as a divisor later
(see CVE-2009-4307).
Recall that the bug was found using fsfuzzer on s390 and ppc64.
Actually, if fsfuzzer were running on x86, the division by zero
wouldn’t have been triggered at all, because on x86 1 << 32
evaluates to 1, rather than 0. The reason is that s390 and ppc use
6 bits for the shifting amount, while x86 uses 5.
Division by zero is easy to fix, just to add a zero check
groups_per_flex == 0
, isn’t it? That’s exactly the
initial patch
from the IBM engineer who discovered the bug.
The ext4 developer changed the patch a little bit in the fix.
- if (!sbi->s_es->s_log_groups_per_flex) {
+ sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
+ groups_per_flex = 1 << sbi->s_log_groups_per_flex;
+
+ if (groups_per_flex < 2) {
sbi->s_log_groups_per_flex = 0;
return 1;
}
- sbi->s_log_groups_per_flex = sbi->s_es->s_log_groups_per_flex;
- groups_per_flex = 1 << sbi->s_log_groups_per_flex;
-
This patch basically combines
-
the existing zero check on
s_log_groups_per_flex
(that is,groups_per_flex == 1
), and -
the newly proposed zero check
groups_per_flex == 0
into a single check groups_per_flex < 2
.
Let’s examine the fixes. First, what if s_log_groups_per_flex
is 36? groups_per_flex
(1 << 36
) is 0 on ppc, while it’s 16
on x86, which will bypass the check. This causes a bizarre
inconsistency not only across architectures, but also between
s_log_groups_per_flex
and groups_per_flex
—logically,
s_log_groups_per_flex
is expected to be the log of groups_per_flex
.
A subtler part is that shifting an n
-bit integer by n
or
more bits is undefined in C. That means a compiler can make
aggressive assumptions, such as s_log_groups_per_flex < 32
, which
further implies that groups_per_flex
will never be zero. Thus,
in the initial patch, the zero check groups_per_flex == 0
would
be optimized away by a standard-conforming compiler (at least Clang
does so).
Fortunately, the check was changed to groups_per_flex < 2
, which
hinders the optimization and fixes the division by zero. Of course,
future compilers may still break that, for example, by rewriting
the check to groups_per_flex == 1
.
To sum up,
a better fix
is to check s_log_groups_per_flex
rather than groups_per_flex
,
so as to avoid the undefined behavior.
BTW, we might need a tighter limit on s_log_groups_per_flex
to
avoid overflowing some allocation size.