Firstly, I would like to say thank to Dan Rosenberg for his interesting information on the mechanism of ROP mitigation on Windows 8. Also, I highly appreciate Nguyen Hong Son, my young fellow, for his effort in specifying Dan’s method into a generic ROP chain. However, it is easy to find out that there are two issues in the ROP chain:
- For the first issue, it requires EAX to point to a valid stack value before entering ROP chain; else, ROP chain cannot be used. In fact, there are many cases we have no register to store ESP value (e.g. in cases “pivoting stack” by MOV ESP, R32 instruction instead of XCHG). Even when there is any register other than EAX to store the valid ESP value, transferring it to EAX by “ROP gadgets” is not always easy. So, to have a “generic ROP chain” which can be widely used on Windows 8, this issue must be solved.
- Other issue lies on the length of ROP chain. While generic ROP chains on Windows 7 are rather short (18 dwords with new Corelanc0d3r’s ROP chain, and 22 dwords with Sayonara’s ROP chain), this ROP chain has about 100 dwords. My fellow or anyone else following Dan’s method may make “the ROP chain” shorter if he focuses on optimizing the use of “ROP gadgets”, but the chain is still rather long in comparison with ROP chains on Windows 7. I believe so. Since our exploit code sometimes may face strict conditions regarding size, the length of a ROP chain is also a concerning matter, like the matter of shellcode’s length. Moreover, I still think that a short and tidy ROP chain will have more beauty and perfection.
Focusing on the protection mechanism of Windows 8, I have made a new generic ROP chain based on my own method. And more importantly, it solves the above issues.
My Rop chain, also built on the same library msvcr71.dll as the mentioned ROP chains, has 3 stages:
Stage 1: Determine the valid stack range
This stage will make EAX point to a valid stack range with current check method of Windows 8. Thereby, the first issue I mentioned above can be solved.
Back to Dan Rosenberg’s idea in his sample, he will restore old stack (which before “pivot the stack”) to bypass the check mechanism. This is the stack range previously used normally, thus it is obviously valid. This is a good solution if we have some registers to backup old stack in order to restore it. However, here I am discussing the case no register can meet our requirements.
Now have a look at Windows’s check mechanism: ESP value will be valid if it is in range between “stack min” (FS: ) and “stack max” (FS:). So, why don’t use “stack min”? It will obviously bypass the check method. If I can get the value, I will have a huge valid stack range for use. So, I will look for this value.
The searching on msvcr71.dll for ROP gadgets which directly interact with FS: does not bring about results. It is easy to understand since StackBase and StackTop values are often accessed from TEB structure. Thus, I look for gadgets referring to &(TEB) ( FS: ), and get only one result at 0x7c34d38f :
At first glance, this sequence of instructions seems inappropriate to become a “ROP gadget” due to too many calculations and jump instructions before RETN. However, this is the only result found on msvcr71.dll which has reference to FS:, so I analyze it carefully. This piece of code seems to be also verifying wshether EBX is between StackBase and StackTop. Luckily, I found out that if the registers are arranged in an appropriate way, I can store FS: value (or StackBase pointed by ecx) into where I want ( [EBP+8] or [EBP-4] ), and nicely return to ROP chain. Follow is Stage 1 with 13 dwords:
Stage 2: Copy a Win7 ROP chain to the valid stack range, then return to it.
Here, I will solve the length issue of ROP chain. The gap of length between Dan’s ROP chain and other ROP chains on Windows 7 is rather high because it has more calculations, data movements between registers and data writing on memory. The limits of ROP gadgets often make things complicated. While ROP chains on Windows 7 have simple calculations, reasonably utilize PUSHAD to push data onto stack, and then perform function call.
Since stack verifying mechanism of Windows 8 only applies to functions associated with manipulating virtual memory (like VirtualProtect), I can completely arrange and call a data copying function (memcpy, for example) without paying attention to the validity of ESP.
So, in this stage, I will use PUSHAD (like the style of generic ROP chains on Windows 7) to have a short piece of ROP to copy the whole Stage 3 (which is a ROP chain on Windows 7) and shellcode onto the valid stack range I have (pointed by EAX), then return to it.
Stage 2 has 8 dwords:
Stage 3: A usual ROP chain on Windows 7
The whole stage 3, and the shellcode following it, will be saved on valid stack range. Thus, its execution on Windows 8 is now similar to Windows 7 and earlier versions.
So, what I need here is only a ROP chain which can run smoothly on Windows 7. I referred to the smallest ROP chain by Corelanc0d3r (updated in Oct, 2011) with 18 dwords, and wrote a new version with only 14 dwords (Cool! Isn’t it?). Universal ROP chain, here it is:
- Combining the 3 above stages, we have a universal ROP chain running smoothly on Windows 8 with 35 dwords length (13+8+14).
- In case we already have an EAX pointing to a valid stack range, skip stage 1. The ROP chain only needs 22 dwords (8+14) to run on Windows 8.
- For Windows 7 and earlier, my stage 3 with 14 dwords will be a generic ROP chain which is extremely short but complete.
The attachment is the demo of exploiting CVE-2011-0065, tested with Firefox 3.16 on Windows 7 and Windows 8.
Le Manh Tung
Senior Security Researcher