question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Ghidra cannot disassmble many bytes for ARM binaries in Thumb mode

See original GitHub issue

Describe the bug Ghidra does not disassemble many bytes for the test case binaries. For example, at addr 0x8ac9c, there should be code while Ghidra will left the bytes as data and would not disassemble them

To Reproduce Steps to reproduce the behavior: Feed the file to Ghidra. Jump to the address in the screenshot after ghidra finishes the analysis

Expected behavior Assembly language should be listed for these bytes rather than keep them as data with the question mark

Screenshots image

Attachments test_bin.zip

Environment (please complete the following information):

  • OS: [e.g. Ubuntu 18.04]
  • Java Version: [e.g. 11.0]
  • Ghidra Version: [e.g. 9.0.4]

Issue Analytics

  • State:open
  • Created 4 years ago
  • Comments:8 (3 by maintainers)

github_iconTop GitHub Comments

3reactions
emteerecommented, Jun 7, 2019

The switch statement is not recovering for two reasons:

  • There is not a proper guard on the switch which tells the number of cases, both the input R0, and the return value from the call function feed into the switch table - The second is the call in the guard if is actually not a call at all, but a long-jump When data flow such as this aren’t working as expected there is usually a reason, and fixing those issues will always make your reverse engineering results better with a decompiler.

If you modify the BL 0x88BA0 to be a branch instead of a call, the switch statement will recover. You may need to re-analyze the area. This issue is indicative of a larger problem, which is the program is using BL instructions, which are normally used for calls, as long branches.

There are mechanisms that will simulate the switch call and recover all the branches, however the decompiler flow analysis is much better, and can recover why each case is taken.

If you really want to get the code out of this large auto-generated YACC routine, and other routines, then you’ll need to do the analysis a bit more carefully.

If you don’t fix these, then recovering the switch statement is the least of your issues with the binary. There are many more that need to be fixed. When encountering a binary that has these issues the default analysis will need to be modified so that certain things don’t occur based on the incorrect creation of functions from these BL call-long jump instructions.

Right after importing the binary, go to the following addresses and disasemble them as thumb. This is so the non-returning thunk to stack_check_fail can be found before alot of flow damage has to be fixed, wasting time. 0x000118c8 - press F12 to disassemble as Thumb Create a function at 0x000118c8

Then analyze the binary, and:

Turn off Discover Non-Returning Turn off Shared Return analysis Turn off Stack Analysis Turn off ARM constant propagation.

Then analyze the code.

You’ll re-analyze later when the code is fixed. Leaving these on will waste time and spray bad references all over because the code is mal-formed.

You can then run one of two scripts, Fix_ARM_Call_JumpsScript which will go through all the calls and attempt to figure out the correct flow, changing BL’s that are really long jumps to branches.

Unfortunately this doesn’t work totally on your massive routine. Select the addresses between 0x7ff5c through 0x89150 with Select->Bytes…->ToAddress. I would also turn this into a Hilight so you can easily get back the selection, as selection is brittle. Clear all the functions from within the selection, except the top one with Edit->ClearWithOptions… Then run Override_ARM_Call_JumpsScript. This will force any BL’s with destinations within the selected region to a Branch flow. You may need to do this repeatedly as more code is found, and you may need to analyze within the area so that the switches will recover. Normally this is an automated process, but the newly found code will have bad flowing BL’s in it.

We could automate some of this, for instance the clearing of the existence of the functions in the override area. We’ve planned to come up with a better automated solution for this issue, but automating this bad of flow can be tricky, and can cause more hidden damage if not done correctly.

Once you’re sure you’ve cleaned up the above BL issues, you can re-run auto-analysis and turn on ARM Constant Propagation. I’d still be careful turning back on Shared Return and Non-Returning, because you may discover more code that has bad flowing BL’s.

The FixupNoReturn script can be used to hand choose non-returning functions.

2reactions
pwmoorecommented, Oct 1, 2019

I was actually just going to open a new issue, but found this one searching, so it may be worth it to just append here. Let me know if you’d rather me open a new one.

I’ve noticed the same behavior on both x86_64 and AArch64 Mach-O files: large switch statements (like those generated by yacc/bison) fail to disassemble correctly, and as such, the decompilation is all messed up. I unfortunately cannot share the binary I"m working on, so I’ve been trying to create a test case that I can share.

Let me know if you’d like me to open a separate issue for this.

Read more comments on GitHub >

github_iconTop Results From Across the Web

ARM decode failure: spurious shift from ARM to THUMB ...
Ghidra is very careful to try and follow ARM/Thumb mode disassembly. The most likely culprit is old flow that was cleared, or somehow...
Read more >
In Ghidra what do I need to set so disassembler is in Thumb ...
In IDA I would press Alt+G and set the T register to 1 to first the code to be Thumb, but in Ghidra...
Read more >
Resolving ARM Syscalls in Ghidra – Sycall 7
The figure below shows the full disassembly for the same function. Disassembly of Function that makes System Calls. The first system call, syscall...
Read more >
Release Notes - Ghidra
Hovering on an address will now show where the byte at that address came from ... Corrected ARM/Thumb instruction parsing for Thumb bl...
Read more >
Ground Truth for Binary Disassembly is Not Easy - USENIX
In particular, they focused on challenging code con- structs, such as non-instruction bytes in code, non-contiguous functions, and jump tables. Reusing Existing ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found