Ghidra cannot disassmble many bytes for ARM binaries in Thumb mode
See original GitHub issueDescribe the bug
Ghidra does not disassemble many bytes for the test case binaries. For example, at addr 0x8ac9c
, there should be code while Ghidra will left the bytes as data and would not disassemble them
To Reproduce Steps to reproduce the behavior: Feed the file to Ghidra. Jump to the address in the screenshot after ghidra finishes the analysis
Expected behavior Assembly language should be listed for these bytes rather than keep them as data with the question mark
Screenshots
Attachments test_bin.zip
Environment (please complete the following information):
- OS: [e.g. Ubuntu 18.04]
- Java Version: [e.g. 11.0]
- Ghidra Version: [e.g. 9.0.4]
Issue Analytics
- State:
- Created 4 years ago
- Comments:8 (3 by maintainers)
Top Results From Across the Web
ARM decode failure: spurious shift from ARM to THUMB ...
Ghidra is very careful to try and follow ARM/Thumb mode disassembly. The most likely culprit is old flow that was cleared, or somehow...
Read more >In Ghidra what do I need to set so disassembler is in Thumb ...
In IDA I would press Alt+G and set the T register to 1 to first the code to be Thumb, but in Ghidra...
Read more >Resolving ARM Syscalls in Ghidra – Sycall 7
The figure below shows the full disassembly for the same function. Disassembly of Function that makes System Calls. The first system call, syscall...
Read more >Release Notes - Ghidra
Hovering on an address will now show where the byte at that address came from ... Corrected ARM/Thumb instruction parsing for Thumb bl...
Read more >Ground Truth for Binary Disassembly is Not Easy - USENIX
In particular, they focused on challenging code con- structs, such as non-instruction bytes in code, non-contiguous functions, and jump tables. Reusing Existing ...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
The switch statement is not recovering for two reasons:
If you modify the BL 0x88BA0 to be a branch instead of a call, the switch statement will recover. You may need to re-analyze the area. This issue is indicative of a larger problem, which is the program is using BL instructions, which are normally used for calls, as long branches.
There are mechanisms that will simulate the switch call and recover all the branches, however the decompiler flow analysis is much better, and can recover why each case is taken.
If you really want to get the code out of this large auto-generated YACC routine, and other routines, then you’ll need to do the analysis a bit more carefully.
If you don’t fix these, then recovering the switch statement is the least of your issues with the binary. There are many more that need to be fixed. When encountering a binary that has these issues the default analysis will need to be modified so that certain things don’t occur based on the incorrect creation of functions from these BL call-long jump instructions.
Right after importing the binary, go to the following addresses and disasemble them as thumb. This is so the non-returning thunk to stack_check_fail can be found before alot of flow damage has to be fixed, wasting time. 0x000118c8 - press F12 to disassemble as Thumb Create a function at 0x000118c8
Then analyze the binary, and:
Turn off Discover Non-Returning Turn off Shared Return analysis Turn off Stack Analysis Turn off ARM constant propagation.
Then analyze the code.
You’ll re-analyze later when the code is fixed. Leaving these on will waste time and spray bad references all over because the code is mal-formed.
You can then run one of two scripts, Fix_ARM_Call_JumpsScript which will go through all the calls and attempt to figure out the correct flow, changing BL’s that are really long jumps to branches.
Unfortunately this doesn’t work totally on your massive routine. Select the addresses between 0x7ff5c through 0x89150 with Select->Bytes…->ToAddress. I would also turn this into a Hilight so you can easily get back the selection, as selection is brittle. Clear all the functions from within the selection, except the top one with Edit->ClearWithOptions… Then run Override_ARM_Call_JumpsScript. This will force any BL’s with destinations within the selected region to a Branch flow. You may need to do this repeatedly as more code is found, and you may need to analyze within the area so that the switches will recover. Normally this is an automated process, but the newly found code will have bad flowing BL’s in it.
We could automate some of this, for instance the clearing of the existence of the functions in the override area. We’ve planned to come up with a better automated solution for this issue, but automating this bad of flow can be tricky, and can cause more hidden damage if not done correctly.
Once you’re sure you’ve cleaned up the above BL issues, you can re-run auto-analysis and turn on ARM Constant Propagation. I’d still be careful turning back on Shared Return and Non-Returning, because you may discover more code that has bad flowing BL’s.
The FixupNoReturn script can be used to hand choose non-returning functions.
I was actually just going to open a new issue, but found this one searching, so it may be worth it to just append here. Let me know if you’d rather me open a new one.
I’ve noticed the same behavior on both x86_64 and AArch64 Mach-O files: large switch statements (like those generated by yacc/bison) fail to disassemble correctly, and as such, the decompilation is all messed up. I unfortunately cannot share the binary I"m working on, so I’ve been trying to create a test case that I can share.
Let me know if you’d like me to open a separate issue for this.