question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

I cannot get corrent cfg when analyzing a program with indirect calls

See original GitHub issue

Hi I analyzing a program with indirect calls by Angr. However, it seems that the constructed cfg is incorrect. Below is my tested program.

#include <stdio.h>
#include <string.h>
#include <unistd.h>
#include <fcntl.h>
#include <stdlib.h>
void fun1()
{
  printf("fun1\n");
}
void fun2()
{
  printf("fun2\n");
}
void fun3()
{
  printf("fun3\n");
}
void fun4()
{
  printf("fun4\n");
}

void (*fun[4])(void) = {fun1, fun2, fun3, fun4};

void callsites(int x)
{
  (*fun[x])();
  //fun1();
}

void main(int argc, char** argv)
{
  int i = atoi(argv[1]);
  printf("%d\n", i);
  callsites(i);
}

Below is the commands.

import angr
b = angr.Project("indirect_call", load_options={'auto_load_libs': False})
cfg = b.analyses.CFG(keep_state=True, enable_symbolic_back_traversal = True)
print cfg.graph
print "----------------------------------------------"
print "It has %d nodes and %d edges" % (len(cfg.graph.nodes()), len(cfg.graph.edges()))
target_func = cfg.kb.functions.function(name="callsites")
print "----------------------------------------------"
print target_func
print "----------------------------------------------"
target_func = cfg.kb.functions.function(name="fun1")
print target_func
target_func = cfg.kb.functions.function(name="fun2")
print target_func
target_func = cfg.kb.functions.function(name="fun3")
print target_func
target_func = cfg.kb.functions.function(name="fun4")
print target_func

The result is showing as follows.

WARNING | 2016-05-07 19:06:02,381 | simuvex.s_run | Exit state has over 257 possible solutions. Likely unconstrained; skipping. <BV64 reg_18_47_64>

----------------------------------------------
It has 43 nodes and 58 edges
----------------------------------------------
Function callsites [0x40060a]
  Syscall: False
  SP difference: 0
  Has return: False
  Returning: Unknown
  Arguments: reg: [], stack: []
  Blocks: []
  Calling convention: UnknownCC - AMD64 [] sp_delta=0
----------------------------------------------
None
None
None
None

I am not sure whether I need to set the option enable_symbolic_back_traversal as True. But no matter whether I set the option or not, the result is more or less the same.

I find that, Angr can find the function callsites, but it did not give me detailed information about it. Besides, Angr fails to find fun1, fun2, fun3, and fun4 at all, that are called indirectly by callsites.

If I comment all statements in callsites, i.e., let callsites empty. The experimental result is changed as follows.

----------------------------------------------
It has 43 nodes and 59 edges
----------------------------------------------
Function callsites [0x40060a]
  Syscall: False
  SP difference: 0
  Has return: True
  Returning: True
  Arguments: reg: [72], stack: [0L]
  Blocks: []
  Calling convention: System V AMD64 - AMD64 [<rdi>]
----------------------------------------------
None
None
None
None

I find that Angr gives detailed information of callsites this time, and find 59 edges (1 more edge than previous experiment).

Then I uncomment the statement fun1(); in callsites. In other words, I make callsites call fun1 directly. Experimental result is as follows.


----------------------------------------------
It has 48 nodes and 66 edges
----------------------------------------------
Function callsites [0x40060a]
  Syscall: False
  SP difference: 0
  Has return: True
  Returning: True
  Arguments: reg: [72], stack: [0L]
  Blocks: [0x40060a, 0x40061f]
  Calling convention: System V AMD64 - AMD64 [<rdi>]
----------------------------------------------
Function fun1 [0x4005c6]
  Syscall: False
  SP difference: 0
  Has return: True
  Returning: True
  Arguments: reg: [], stack: [0L]
  Blocks: [0x4005d4, 0x4005c6]
  Calling convention: System V AMD64 - AMD64 []
None
None
None

I can see that Angr gives detailed information of callsites and fun1 now. Moreover, Angr finds more nodes and edges than previous two experiments.

So, I suspect Angr fails to handle indirect calls? Or are there any errors in above commands?

Thanks a lot!

Ting Chen

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Comments:11 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
ltfishcommented, May 9, 2016

I hope Angr can construct CFG correctly without the explicit check code. In many cases, we cannot change the source of the tested program, so we cannot add the explict check manually.

Recovering the CFG from machine code is not as easy as it seems to be. In your example (the buggy one), (*fun[0])(), (*fun[-1])(), and (*fun[13337])() are all equivalent to the CPU, and apparently, they are all possible targets for that call.

angr will be able to resolve the jump table in callsites() with heuristics once the jump table resolution is fully implemented in CFGAccurate. Heuristics are essential here in order to identify the base address of the function pointer array, which might work for a lot of real-world cases, but is impossible to work for all such cases.

For this specific example, I believe the best way is to rely on the jump table resolution of CFGAccurate. I’ll work on it later this week.

If the tested program is not written by me, and I get a warning like WARNING | 2016-05-09 13:44:43,902 | simuvex.s_run | Exit state has over 257 possible solutions. Likely unconstrained; skipping. <BV64 reg_18_47_64> . How can I locate and diagnose the bug?

CFGAccurate is not designed to be a vulnerability finder. You can modify angr’s code (start from CFGAccurate._get_simrun() in cfg_accurate.py) in order to catch this specific case.

However, having too many possible solutions does not necessarily mean the existence of a bug. It could also mean some indirect jump is not easily recoverable. You’ll have to make some policies in order to use CFGAccurate to detect bugs/vulnerabilities.

It does work no matter adding context_sensitivity_level or call_depth.

I assume you mean “it does not work even when context_sensitivity_level or call_depth is used”. (If I remember correctly), as an optimization (for speed), symbolic back traversal does not go back beyond function boundaries. It also has a limit of how many basic blocks angr should traverse back. In your case, I think it’s likely that angr does not traverse back enough number of basic blocks to see your constraint on i. You can adjust some parameters inside CFGAccurate._symbolically_back_traverse() to make it work in this case.

1reaction
rhelmotcommented, May 9, 2016

The control flow recovery is not meant to be a vulnerability analysis. You want to write your own static/symbolic analysis that searches for dereferences on user input.

I’m sorry about the vague answer, but if there were an easy answer to “how can I use angr to find bugs in programs” then there would be a lot less research going on right now 😃

Read more comments on GitHub >

github_iconTop Results From Across the Web

CFG - angr Documentation
Sometimes, the analysis cannot detect what the possible target of an indirect call or jump could be. If this occurs within a function,...
Read more >
Refining Indirect Call Targets at the Binary Level
In bug finding, coarse grained CFGs result in imprecise and unscalable analysis, while ignoring indirect calls results in an incomplete analysis and failing...
Read more >
/guard (Enable Control Flow Guard) | Microsoft Learn
During compiling and linking, all indirect calls in your code are analyzed to find every location that the code can reach when it...
Read more >
Adaptive Call-Site Sensitive Control Flow Integrity
CFI-LB features the adaptive call- site sensitivity in which each indirect call has its own level of sensitivity and the multi-scope CFG to...
Read more >
Exploring Control Flow Guard in Windows 10 - Trend Micro
Microsoft's implementation of CFG is focused on indirect call protection. Consider the following code in the test program I created: Page 3. Trend...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found