Format string analysis does not recognize some strings
See original GitHub issueDescribe the bug Hello, I am running into more issues with format strings in the variadic function signature override analysis.
It seems that when analyzing strings passed to one of the format string functions (printf
, sprintf
, etc.) and that string has an ANSI escape code, we get an error of StringIndexOutOfBoundsException
when that string also happens to have a %
character as the 20th character due to the BUFFER_LENGTH=20
here https://github.com/NationalSecurityAgency/ghidra/blob/0241b2b97ebc1356850186796153b0e5f509f96e/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/PcodeFunctionParser.java#L193-L194
It looks like this error happens because the string with ANSI escape code is marked as DAT_xxxx
instead of a TerminatedCString
, and when I change the data in memory to a TerminatedCString
, the analysis passes without error.
I assume the unprintable character(s) for the ANSI escape code(s) is preventing Ghidra from recognizing this as a string.
Thank you for the fixes in https://github.com/NationalSecurityAgency/ghidra/issues/4256 and again in https://github.com/NationalSecurityAgency/ghidra/issues/4165!!
To Reproduce Steps to reproduce the behavior:
- Compile the following C code (or see attachment below)
#include <stdio.h>
int main() {
printf("\x1b[41m"); // Red
printf("reproduces error : %s!\n\x1b[0m", "a string");
return 0;
}
- Load into Ghidra with default analyzers
- Select One-Shot analysis of “Variadic Function Signature Override”
- Inspect user log for error
Expected behavior No errors and the whole string is analyzed instead of only 20 characters.
It would also be nice if the string with ANSI codes was identified as a TerminatedCString
due to the fact that it’s used in a format string function that takes a char *
, but I might be oversimplifying here.
Screenshots
In the screenshot, you can see the
DAT_00402013
reference to the string with ANSI escape codes. If I double-click on it, it takes me to the .rodata
section, and then I can right-click and select Data
-> TerminatedCString
. If I re-run the “Variadic Function Signature Override” analysis, it completes without errors.
Attachments ansi.zip binary.
Environment (please complete the following information):
- OS: macOS 12.4
- Java Version: 11
- Ghidra Version: latest
master
commit0241b2b97ebc1356850186796153b0e5f509f96e
- Ghidra Origin: locally built
Additional context I made a patch to help with debugging:
diff --git a/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/FormatStringAnalyzer.java b/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/FormatStringAnalyzer.java
index 9a8019a74..0b2a6c46d 100644
--- a/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/FormatStringAnalyzer.java
+++ b/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/FormatStringAnalyzer.java
@@ -272,8 +272,13 @@ public class FormatStringAnalyzer extends AbstractAnalyzer {
// looks like scanf since it takes in inputs. We need this information
// so that the correct DataType arguments are generated
boolean isOutputType = !callFunctionName.contains(INPUT_FUNCTION_SUBSTRING);
- List<FormatArgument> formatArguments =
- parser.convertToFormatArgumentList(formatString, isOutputType);
+ List<FormatArgument> formatArguments;
+ try {
+ formatArguments = parser.convertToFormatArgumentList(formatString, isOutputType);
+ } catch(StringIndexOutOfBoundsException e) {
+ Msg.error(this, "Failed parsing format string: '" + formatString + "' @ 0x" + address);
+ throw e;
+ }
DataType[] dataTypes = isOutputType ? parser.convertToOutputDataTypes(formatArguments)
: parser.convertToInputDataTypes(formatArguments);
And it produces this stack trace with the following message:
2022-05-24 15:09:09 ERROR (FormatStringAnalyzer) Failed parsing format string: 'reproduces error : %' @ 0x00401148
Click to see exception traceback
2022-05-24 15:09:09 ERROR (FormatStringAnalyzer) Failed parsing format string: 'reproduces error : %' @ 0x00401148
2022-05-24 15:09:09 ERROR (InternalResultListener) Unexpected exception getting Decompiler result java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: String index out of range: 20
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at generic.concurrent.QResult.<init>(QResult.java:40)
at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:78)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 20
at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47)
at java.base/java.lang.String.charAt(String.java:693)
at ghidra.app.plugin.core.string.variadic.FormatStringParser.parseFormatString(FormatStringParser.java:93)
at ghidra.app.plugin.core.string.variadic.FormatStringParser.convertToFormatArgumentList(FormatStringParser.java:292)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.parseParameters(FormatStringAnalyzer.java:277)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.initSignature(FormatStringAnalyzer.java:323)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.overrideFunctionCall(FormatStringAnalyzer.java:352)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.overrideCallList(FormatStringAnalyzer.java:341)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer$1.process(FormatStringAnalyzer.java:211)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer$1.process(FormatStringAnalyzer.java:188)
at ghidra.app.decompiler.parallel.DecompilerCallback.process(DecompilerCallback.java:80)
at ghidra.app.decompiler.parallel.DecompilerCallback.process(DecompilerCallback.java:41)
at generic.concurrent.ConcurrentQ$CallbackCallable.call(ConcurrentQ.java:658)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:76)
... 3 more
2022-05-24 15:09:09 ERROR (FormatStringAnalyzer) Error: could not decompile functions with ParallelDecompiler java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: String index out of range: 20
at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
at generic.concurrent.QResult.<init>(QResult.java:40)
at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:78)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 20
at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47)
at java.base/java.lang.String.charAt(String.java:693)
at ghidra.app.plugin.core.string.variadic.FormatStringParser.parseFormatString(FormatStringParser.java:93)
at ghidra.app.plugin.core.string.variadic.FormatStringParser.convertToFormatArgumentList(FormatStringParser.java:292)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.parseParameters(FormatStringAnalyzer.java:277)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.initSignature(FormatStringAnalyzer.java:323)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.overrideFunctionCall(FormatStringAnalyzer.java:352)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.overrideCallList(FormatStringAnalyzer.java:341)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer$1.process(FormatStringAnalyzer.java:211)
at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer$1.process(FormatStringAnalyzer.java:188)
at ghidra.app.decompiler.parallel.DecompilerCallback.process(DecompilerCallback.java:80)
at ghidra.app.decompiler.parallel.DecompilerCallback.process(DecompilerCallback.java:41)
at generic.concurrent.ConcurrentQ$CallbackCallable.call(ConcurrentQ.java:658)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:76)
... 3 more
Issue Analytics
- State:
- Created a year ago
- Comments:7 (5 by maintainers)
Assuming our nightly tests pass, you can expect it tomorrow.
Thank you for reporting this. It’s probably worth adding a patch that will handle the exception gracefully and issue some kind of error or warning message.
I agree that it would be better if there were already a string defined at this location. This should be handled more generally - for a function with a signature you trust (such as a library function), if Ghidra can identify addresses that are passed to the function it should go to those addresses and lay down the appropriate type. We’re mulling this over.