Add parsing of OVERLAY section for PE files
See original GitHub issueI’m currently playing with the crackme files found here https://github.com/Maijin/Workshop2015/tree/master/IOLI-crackme which are available for both Linux and Windows. I’ve been comparing the behavior of angr using matching binaries but different file formats.
I’d like to record a few of the differences here to: a) see if I’m crazy b) see if more specific tickets should be created c) see if some pointers / todo’s could be provided as to what specific things need to be implemented to have better pe/exe support.
The main function for the crackme0x00 file is (from Radare2):
[0x08048414]> pdf
/ (fcn) sym.main 127
| ; var int local_6 @ ebp-0x18
| ; DATA XREF from 0x08048377 (sym.main)
| ;-- main:
| 0x08048414 55 push ebp
| 0x08048415 89e5 mov ebp, esp
| 0x08048417 83ec28 sub esp, 0x28
| 0x0804841a 83e4f0 and esp, 0xfffffff0
| 0x0804841d b800000000 mov eax, 0
| 0x08048422 83c00f add eax, 0xf
| 0x08048425 83c00f add eax, 0xf
| 0x08048428 c1e804 shr eax, 4
| 0x0804842b c1e004 shl eax, 4
| 0x0804842e 29c4 sub esp, eax
| 0x08048430 c70424688504. mov dword [esp], str.IOLI_Crackme_Level_0x00_n ; [0x8048568:4]=0x494c4f49 LEA str.IOLI_Crackme_Level_0x00_n ; "IOLI Crackme Level 0x00." @ 0x8048568
| 0x08048437 e804ffffff call sym.imp.printf
| 0x0804843c c70424818504. mov dword [esp], str.Password: ; [0x8048581:4]=0x73736150 LEA str.Password: ; "Password: " @ 0x8048581
| 0x08048443 e8f8feffff call sym.imp.printf
| 0x08048448 8d45e8 lea eax, [ebp-local_6]
| 0x0804844b 89442404 mov dword [esp + 4], eax
| 0x0804844f c704248c8504. mov dword [esp], 0x804858c ; [0x804858c:4]=0x32007325 ; "%s" @ 0x804858c
| 0x08048456 e8d5feffff call sym.imp.scanf
| 0x0804845b 8d45e8 lea eax, [ebp-local_6]
| 0x0804845e c74424048f85. mov dword [esp + 4], str.250382 ; [0x804858f:4]=0x33303532 LEA str.250382 ; "250382" @ 0x804858f
| 0x08048466 890424 mov dword [esp], eax
| 0x08048469 e8e2feffff call sym.imp.strcmp
| 0x0804846e 85c0 test eax, eax
| ,=< 0x08048470 740e je 0x8048480
| | 0x08048472 c70424968504. mov dword [esp], str.Invalid_Password__n ; [0x8048596:4]=0x61766e49 LEA str.Invalid_Password__n ; "Invalid Password!." @ 0x8048596
| | 0x08048479 e8c2feffff call sym.imp.printf
| ,==< 0x0804847e eb0c jmp 0x804848c
| |`-> 0x08048480 c70424a98504. mov dword [esp], str.Password_OK_:__n ; [0x80485a9:4]=0x73736150 LEA str.Password_OK_:__n ; "Password OK :)." @ 0x80485a9
| | 0x08048487 e8b4feffff call sym.imp.printf
| | ; JMP XREF from 0x0804847e (sym.main)
| `--> 0x0804848c b800000000 mov eax, 0
| 0x08048491 c9 leave
\ 0x08048492 c3 ret
[0x08048414]>
Similarly for the EXE:
(fcn) sym._main 141
| ; var int local_0_1 @ ebp-0x1
| ; var int local_6 @ ebp-0x18
| ; var int local_7 @ ebp-0x1c
| ; CALL XREF from 0x00401222 (sym._main)
| 0x00401310 55 push ebp
| 0x00401311 89e5 mov ebp, esp
| 0x00401313 83ec38 sub esp, 0x38
| 0x00401316 83e4f0 and esp, 0xfffffff0
| 0x00401319 b800000000 mov eax, 0
| 0x0040131e 83c00f add eax, 0xf
| 0x00401321 83c00f add eax, 0xf
| 0x00401324 c1e804 shr eax, 4
| 0x00401327 c1e004 shl eax, 4
| 0x0040132a 8945e4 mov dword [ebp-local_7], eax
| 0x0040132d 8b45e4 mov eax, dword [ebp-local_7]
| 0x00401330 e83b190000 call 0x402c70 ; sym.___w32_sharedptr_initialize+0x220
| 0x00401335 e836010000 call sym.___main
| 0x0040133a c70424004040. mov dword [esp], str.IOLI_Crackme_Level_0x00_n ; [0x404000:4]=0x494c4f49 LEA section..rdata ; "IOLI Crackme Level 0x00." @ 0x404000
| 0x00401341 e8ea190000 call sym._printf
| 0x00401346 c70424194040. mov dword [esp], str.Password: ; [0x404019:4]=0x73736150 LEA str.Password: ; "Password: " @ 0x404019
| 0x0040134d e8de190000 call sym._printf
| 0x00401352 8d45e8 lea eax, [ebp-local_6]
| 0x00401355 89442404 mov dword [esp + 4], eax
| 0x00401359 c70424244040. mov dword [esp], 0x404024 ; [0x404024:4]=0x32007325 ; "%s" 0x00404024 ; "%s" @ 0x404024
| 0x00401360 e8bb190000 call sym._scanf
| 0x00401365 8d45e8 lea eax, [ebp-local_6]
| 0x00401368 c74424042740. mov dword [esp + 4], str.250382 ; [0x404027:4]=0x33303532 LEA str.250382 ; "250382" @ 0x404027
| 0x00401370 890424 mov dword [esp], eax
| 0x00401373 e898190000 call sym._strcmp
| 0x00401378 85c0 test eax, eax
| ,=< 0x0040137a 740e je 0x40138a
| | 0x0040137c c704242e4040. mov dword [esp], str.Invalid_Password__n ; [0x40402e:4]=0x61766e49 LEA str.Invalid_Password__n ; "Invalid Password!." @ 0x40402e
| | 0x00401383 e8a8190000 call sym._printf
| ,==< 0x00401388 eb0c jmp 0x401396
| |`-> 0x0040138a c70424414040. mov dword [esp], str.Password_OK_:__n ; [0x404041:4]=0x73736150 LEA str.Password_OK_:__n ; "Password OK :)." @ 0x404041
| | 0x00401391 e89a190000 call sym._printf
| | ; JMP XREF from 0x00401388 (sym._main)
| `--> 0x00401396 b800000000 mov eax, 0
| 0x0040139b c9 leave
\ 0x0040139c c3 ret
First of all, angr doesn’t seem to recognize all the symbols in the exe. For example, the following command runs fine on the Linux version, but not the exe: main = proj.loader.main_bin.get_symbol('main')
I’m not sure if it’s related or not, but creating cfg from main (using the address of main as a start because the symbol is not found as previously shown) produces two very different CFGs even though it’s apparent from the disassembly above that they should be the same. On the ELF side, the CFG is as expected, with 9 basic blocks. (Pictures made using https://github.com/axt/angr-utils)
On the PE side, the CFG has several hundred blocks, seeming to be spanning into other functions. . Note: Main is actually way down in the bottom right corner in this huge graph. Also note, the function name for each node in this graph is “None” instead of the actual name as seen in the Linux graph (probably the same problem as above).
Finally, on the ELF side, angr automatically hooks functions like scanf and printf, but on the PE side that does not appear to work. I’m able to symbolically solve both the PE and ELF versions, but on the PE file, I have to manually setup hooking first.
Maybe all these problems come down to the function symbols not being properly found, I’m not sure.
So… Brain dump I know, but please feel free to split these into as many tickets as you feel is appropriate. I would love to see some pointers to jumping-off points where someone new to the project could start with helping resolve some of these discrepancies. Thanks!
Issue Analytics
- State:
- Created 7 years ago
- Reactions:1
- Comments:7 (2 by maintainers)
Top GitHub Comments
So most of this is known - a vast amount of effort has gone into ELF and Linux support in angr, and almost none into Windows.
The majority of these problems are unrelated to CLE - the only one that is CLE-related is the get_symbol call failing. The rest boil down to, to the best of my knowledge, lack of support for callee-cleanup calling conventions everywhere in angr, and lack of SimProcedures for Windows libraries. There also might be some Linux-specific logic in the CFG algorithm.
The only person on the angr team who actually knows how Windows binaries work is @ltfish, but more to the point this is not something we can give a lot of attention to before the CGC in August.
As for an initial jumping off point, CLE’s support for PE files is based on the
pefile
module. If I recall correctly, we make no attempt whatsoever to provide symbols for PE files whatsoever, so if you want to look into howpefile
exposes symbols and perform the translation necessary to implementget_symbol
for the PE backend, that’d be pretty cool.This issue has been closed due to inactivity.