I cannot get corrent cfg when analyzing a program with indirect calls
See original GitHub issueHi I analyzing a program with indirect calls by Angr. However, it seems that the constructed cfg is incorrect. Below is my tested program.
#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
void fun1()
{
printf("fun1\n");
}
void fun2()
{
printf("fun2\n");
}
void fun3()
{
printf("fun3\n");
}
void fun4()
{
printf("fun4\n");
}
void (*fun[4])(void) = {fun1, fun2, fun3, fun4};
void callsites(int x)
{
(*fun[x])();
//fun1();
}
void main(int argc, char** argv)
{
int i = atoi(argv[1]);
printf("%d\n", i);
callsites(i);
}
Below is the commands.
import angr
b = angr.Project("indirect_call", load_options={'auto_load_libs': False})
cfg = b.analyses.CFG(keep_state=True, enable_symbolic_back_traversal = True)
print cfg.graph
print "----------------------------------------------"
print "It has %d nodes and %d edges" % (len(cfg.graph.nodes()), len(cfg.graph.edges()))
target_func = cfg.kb.functions.function(name="callsites")
print "----------------------------------------------"
print target_func
print "----------------------------------------------"
target_func = cfg.kb.functions.function(name="fun1")
print target_func
target_func = cfg.kb.functions.function(name="fun2")
print target_func
target_func = cfg.kb.functions.function(name="fun3")
print target_func
target_func = cfg.kb.functions.function(name="fun4")
print target_func
The result is showing as follows.
WARNING | 2016-05-07 19:06:02,381 | simuvex.s_run | Exit state has over 257 possible solutions. Likely unconstrained; skipping. <BV64 reg_18_47_64>
----------------------------------------------
It has 43 nodes and 58 edges
----------------------------------------------
Function callsites [0x40060a]
Syscall: False
SP difference: 0
Has return: False
Returning: Unknown
Arguments: reg: [], stack: []
Blocks: []
Calling convention: UnknownCC - AMD64 [] sp_delta=0
----------------------------------------------
None
None
None
None
I am not sure whether I need to set the option enable_symbolic_back_traversal as True. But no matter whether I set the option or not, the result is more or less the same.
I find that, Angr can find the function callsites, but it did not give me detailed information about it. Besides, Angr fails to find fun1, fun2, fun3, and fun4 at all, that are called indirectly by callsites.
If I comment all statements in callsites, i.e., let callsites empty. The experimental result is changed as follows.
----------------------------------------------
It has 43 nodes and 59 edges
----------------------------------------------
Function callsites [0x40060a]
Syscall: False
SP difference: 0
Has return: True
Returning: True
Arguments: reg: [72], stack: [0L]
Blocks: []
Calling convention: System V AMD64 - AMD64 [<rdi>]
----------------------------------------------
None
None
None
None
I find that Angr gives detailed information of callsites this time, and find 59 edges (1 more edge than previous experiment).
Then I uncomment the statement fun1(); in callsites. In other words, I make callsites call fun1 directly. Experimental result is as follows.
----------------------------------------------
It has 48 nodes and 66 edges
----------------------------------------------
Function callsites [0x40060a]
Syscall: False
SP difference: 0
Has return: True
Returning: True
Arguments: reg: [72], stack: [0L]
Blocks: [0x40060a, 0x40061f]
Calling convention: System V AMD64 - AMD64 [<rdi>]
----------------------------------------------
Function fun1 [0x4005c6]
Syscall: False
SP difference: 0
Has return: True
Returning: True
Arguments: reg: [], stack: [0L]
Blocks: [0x4005d4, 0x4005c6]
Calling convention: System V AMD64 - AMD64 []
None
None
None
I can see that Angr gives detailed information of callsites and fun1 now. Moreover, Angr finds more nodes and edges than previous two experiments.
So, I suspect Angr fails to handle indirect calls? Or are there any errors in above commands?
Thanks a lot!
Ting Chen
Issue Analytics
- State:
- Created 7 years ago
- Comments:11 (6 by maintainers)
Top GitHub Comments
Recovering the CFG from machine code is not as easy as it seems to be. In your example (the buggy one),
(*fun[0])()
,(*fun[-1])()
, and(*fun[13337])()
are all equivalent to the CPU, and apparently, they are all possible targets for that call.angr will be able to resolve the jump table in
callsites()
with heuristics once the jump table resolution is fully implemented inCFGAccurate
. Heuristics are essential here in order to identify the base address of the function pointer array, which might work for a lot of real-world cases, but is impossible to work for all such cases.For this specific example, I believe the best way is to rely on the jump table resolution of
CFGAccurate
. I’ll work on it later this week.CFGAccurate
is not designed to be a vulnerability finder. You can modify angr’s code (start fromCFGAccurate._get_simrun()
incfg_accurate.py
) in order to catch this specific case.However, having too many possible solutions does not necessarily mean the existence of a bug. It could also mean some indirect jump is not easily recoverable. You’ll have to make some policies in order to use
CFGAccurate
to detect bugs/vulnerabilities.I assume you mean “it does not work even when context_sensitivity_level or call_depth is used”. (If I remember correctly), as an optimization (for speed), symbolic back traversal does not go back beyond function boundaries. It also has a limit of how many basic blocks angr should traverse back. In your case, I think it’s likely that angr does not traverse back enough number of basic blocks to see your constraint on
i
. You can adjust some parameters insideCFGAccurate._symbolically_back_traverse()
to make it work in this case.The control flow recovery is not meant to be a vulnerability analysis. You want to write your own static/symbolic analysis that searches for dereferences on user input.
I’m sorry about the vague answer, but if there were an easy answer to “how can I use angr to find bugs in programs” then there would be a lot less research going on right now 😃