question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Format string analysis does not recognize some strings

See original GitHub issue

Describe the bug Hello, I am running into more issues with format strings in the variadic function signature override analysis.

It seems that when analyzing strings passed to one of the format string functions (printf, sprintf, etc.) and that string has an ANSI escape code, we get an error of StringIndexOutOfBoundsException when that string also happens to have a % character as the 20th character due to the BUFFER_LENGTH=20 here https://github.com/NationalSecurityAgency/ghidra/blob/0241b2b97ebc1356850186796153b0e5f509f96e/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/PcodeFunctionParser.java#L193-L194

It looks like this error happens because the string with ANSI escape code is marked as DAT_xxxx instead of a TerminatedCString, and when I change the data in memory to a TerminatedCString, the analysis passes without error.

I assume the unprintable character(s) for the ANSI escape code(s) is preventing Ghidra from recognizing this as a string.

Thank you for the fixes in https://github.com/NationalSecurityAgency/ghidra/issues/4256 and again in https://github.com/NationalSecurityAgency/ghidra/issues/4165!!

To Reproduce Steps to reproduce the behavior:

  1. Compile the following C code (or see attachment below)
#include <stdio.h>

int main() {
  printf("\x1b[41m"); // Red
  printf("reproduces error : %s!\n\x1b[0m", "a string");
  return 0;
}
  1. Load into Ghidra with default analyzers
  2. Select One-Shot analysis of “Variadic Function Signature Override”
  3. Inspect user log for error

Expected behavior No errors and the whole string is analyzed instead of only 20 characters.

It would also be nice if the string with ANSI codes was identified as a TerminatedCString due to the fact that it’s used in a format string function that takes a char *, but I might be oversimplifying here.

Screenshots Disassembly In the screenshot, you can see the DAT_00402013 reference to the string with ANSI escape codes. If I double-click on it, it takes me to the .rodata section, and then I can right-click and select Data -> TerminatedCString. If I re-run the “Variadic Function Signature Override” analysis, it completes without errors.

Attachments ansi.zip binary.

Environment (please complete the following information):

  • OS: macOS 12.4
  • Java Version: 11
  • Ghidra Version: latest master commit 0241b2b97ebc1356850186796153b0e5f509f96e
  • Ghidra Origin: locally built

Additional context I made a patch to help with debugging:

diff --git a/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/FormatStringAnalyzer.java b/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/FormatStringAnalyzer.java
index 9a8019a74..0b2a6c46d 100644
--- a/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/FormatStringAnalyzer.java
+++ b/Ghidra/Features/DecompilerDependent/src/main/java/ghidra/app/plugin/core/string/variadic/FormatStringAnalyzer.java
@@ -272,8 +272,13 @@ public class FormatStringAnalyzer extends AbstractAnalyzer {
 		// looks like scanf since it takes in inputs. We need this information
 		// so that the correct DataType arguments are generated
 		boolean isOutputType = !callFunctionName.contains(INPUT_FUNCTION_SUBSTRING);
-		List<FormatArgument> formatArguments =
-			parser.convertToFormatArgumentList(formatString, isOutputType);
+		List<FormatArgument> formatArguments;
+		try {
+			formatArguments = parser.convertToFormatArgumentList(formatString, isOutputType);
+		} catch(StringIndexOutOfBoundsException e) {
+			Msg.error(this, "Failed parsing format string: '" + formatString + "' @ 0x" + address);
+			throw e;
+		}
 
 		DataType[] dataTypes = isOutputType ? parser.convertToOutputDataTypes(formatArguments)
 				: parser.convertToInputDataTypes(formatArguments);

And it produces this stack trace with the following message:

2022-05-24 15:09:09 ERROR (FormatStringAnalyzer) Failed parsing format string: 'reproduces error : %' @ 0x00401148
Click to see exception traceback
2022-05-24 15:09:09 ERROR (FormatStringAnalyzer) Failed parsing format string: 'reproduces error : %' @ 0x00401148  
2022-05-24 15:09:09 ERROR (InternalResultListener) Unexpected exception getting Decompiler result java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: String index out of range: 20
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at generic.concurrent.QResult.<init>(QResult.java:40)
	at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:78)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 20
	at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47)
	at java.base/java.lang.String.charAt(String.java:693)
	at ghidra.app.plugin.core.string.variadic.FormatStringParser.parseFormatString(FormatStringParser.java:93)
	at ghidra.app.plugin.core.string.variadic.FormatStringParser.convertToFormatArgumentList(FormatStringParser.java:292)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.parseParameters(FormatStringAnalyzer.java:277)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.initSignature(FormatStringAnalyzer.java:323)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.overrideFunctionCall(FormatStringAnalyzer.java:352)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.overrideCallList(FormatStringAnalyzer.java:341)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer$1.process(FormatStringAnalyzer.java:211)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer$1.process(FormatStringAnalyzer.java:188)
	at ghidra.app.decompiler.parallel.DecompilerCallback.process(DecompilerCallback.java:80)
	at ghidra.app.decompiler.parallel.DecompilerCallback.process(DecompilerCallback.java:41)
	at generic.concurrent.ConcurrentQ$CallbackCallable.call(ConcurrentQ.java:658)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:76)
	... 3 more
 
2022-05-24 15:09:09 ERROR (FormatStringAnalyzer) Error: could not decompile functions with ParallelDecompiler java.util.concurrent.ExecutionException: java.lang.StringIndexOutOfBoundsException: String index out of range: 20
	at java.base/java.util.concurrent.FutureTask.report(FutureTask.java:122)
	at java.base/java.util.concurrent.FutureTask.get(FutureTask.java:191)
	at generic.concurrent.QResult.<init>(QResult.java:40)
	at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:78)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.lang.StringIndexOutOfBoundsException: String index out of range: 20
	at java.base/java.lang.StringLatin1.charAt(StringLatin1.java:47)
	at java.base/java.lang.String.charAt(String.java:693)
	at ghidra.app.plugin.core.string.variadic.FormatStringParser.parseFormatString(FormatStringParser.java:93)
	at ghidra.app.plugin.core.string.variadic.FormatStringParser.convertToFormatArgumentList(FormatStringParser.java:292)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.parseParameters(FormatStringAnalyzer.java:277)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.initSignature(FormatStringAnalyzer.java:323)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.overrideFunctionCall(FormatStringAnalyzer.java:352)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer.overrideCallList(FormatStringAnalyzer.java:341)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer$1.process(FormatStringAnalyzer.java:211)
	at ghidra.app.plugin.core.string.variadic.FormatStringAnalyzer$1.process(FormatStringAnalyzer.java:188)
	at ghidra.app.decompiler.parallel.DecompilerCallback.process(DecompilerCallback.java:80)
	at ghidra.app.decompiler.parallel.DecompilerCallback.process(DecompilerCallback.java:41)
	at generic.concurrent.ConcurrentQ$CallbackCallable.call(ConcurrentQ.java:658)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at generic.concurrent.FutureTaskMonitor.run(FutureTaskMonitor.java:76)
	... 3 more

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:7 (5 by maintainers)

github_iconTop GitHub Comments

2reactions
ryanmkurtzcommented, Jul 7, 2022

Assuming our nightly tests pass, you can expect it tomorrow.

2reactions
ghidracadabracommented, May 25, 2022

Thank you for reporting this. It’s probably worth adding a patch that will handle the exception gracefully and issue some kind of error or warning message.

I agree that it would be better if there were already a string defined at this location. This should be handled more generally - for a function with a signature you trust (such as a library function), if Ghidra can identify addresses that are passed to the function it should go to those addresses and lay down the appropriate type. We’re mulling this over.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Format string attack - OWASP Foundation
Description. The Format String exploit occurs when the submitted data of an input string is evaluated as a command by the application.
Read more >
Code analysis and helpers for string literals - ReSharper
ReSharper analyzes format strings and arguments of all .NET string formatting methods, such as String.Format , Text.StringBuilder.AppendFormat , ...
Read more >
confusing: Printf-style format strings should be used correctly
Because printf -style format strings are interpreted at runtime, rather than validated by the compiler, they can contain errors that result in the...
Read more >
A Format String Checker for Java - University of Washington
Java supports format strings, but their use is error prone ... Format string, printf, type system, static analysis ... quent and hard to...
Read more >
Is it better practice to use String.format over ... - Stack Overflow
Because printf-style format strings are interpreted at runtime, rather than validated by the compiler, they can contain errors that result in the wrong...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found