question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Bytecode instrumentation does not respect ClassLoader

See original GitHub issue

After adding elastic-apm-agent-1.19.0 to a Java application, I start getting the following exception:

java.lang.ClassCastException: org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor cannot be cast to org.apache.cassandra.concurrent.TracingAwareExecutorService
	at org.apache.cassandra.concurrent.StageManager.getStage(StageManager.java:158)
	at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:582)
	at org.apache.cassandra.net.async.MeteredMessageInConsumer.receive(MeteredMessageInConsumer.java:41)
	at org.apache.cassandra.net.async.connection.In.doPayload(In.java:395)
	at org.apache.cassandra.net.async.connection.In.receive(In.java:438)
	at org.apache.cassandra.net.async.connection.In.receive(In.java:50)
	at one.actor.Actor.run(Actor.java:82)
	at one.actor.Dispatcher.execute(Dispatcher.java:54)
	at one.actor.Actor.tryScheduleToExecute(Actor.java:223)
	at one.actor.Actor.enqueue(Actor.java:129)
	at org.apache.cassandra.net.async.connection.In.access$600(In.java:50)
	at org.apache.cassandra.net.async.connection.In$1.completed(In.java:214)
	at org.apache.cassandra.net.async.connection.In$1.completed(In.java:181)
	at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
	at sun.nio.ch.Invoker$2.run(Invoker.java:218)
	at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)

The only difference is one JVM option: -javaagent:/path/to/elastic-apm-agent-1.19.0.jar. The application works fine without this option.

JDK 8u265, Linux or Windows.

The logs from the agent:

     [java] 2021-01-04 23:41:24,854 [main] INFO  co.elastic.apm.agent.util.JmxUtils - Found JVM-specific OperatingSystemMXBean interface: com.sun.management.OperatingSystemMXBean
     [java] 2021-01-04 23:41:25,083 [main] INFO  co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.19.0 as Launcher on Java 1.8.0_265 Runtime version: 1.8.0_265-b01 VM version: 25.265-b01 (BellSoft) Windows 10 10.0
     [java] 2021-01-04 23:41:37,125 [main] INFO  co.elastic.apm.agent.impl.ElasticApmTracer - Tracer switched to RUNNING state
     [java] 2021-01-04 23:41:39,340 [elastic-apm-server-healthcheck] WARN  co.elastic.apm.agent.report.ApmServerHealthChecker - Elastic APM server http://localhost:8200/ is not available (Connection refused: connect)
     [java] 2021-01-04 23:41:41,369 [elastic-apm-remote-config-poller] ERROR co.elastic.apm.agent.configuration.ApmServerConfigurationSource - Connection refused: connect
     [java] 2021-01-04 23:42:09,260 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error trying to connect to APM Server. Some details about SSL configurations corresponding the current connection are logged at INFO level.
     [java] 2021-01-04 23:42:09,307 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Failed to handle event of type JSON_WRITER with this error: Connection refused: connect
     [java] 2021-01-04 23:42:09,308 [elastic-apm-server-reporter] INFO  co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 0 seconds (+/-10%)
     [java] 2021-01-04 23:42:39,208 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error trying to connect to APM Server. Some details about SSL configurations corresponding the current connection are logged at INFO level.
     [java] 2021-01-04 23:42:39,209 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Failed to handle event of type JSON_WRITER with this error: Connection refused: connect
     [java] 2021-01-04 23:42:39,209 [elastic-apm-server-reporter] INFO  co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 1 seconds (+/-10%)
     [java] 2021-01-04 23:42:53,844 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor refers to a missing class.
     [java] 2021-01-04 23:42:53,848 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.JMXEnabledScheduledThreadPoolExecutorMBean refers to a missing class.
     [java] 2021-01-04 23:42:54,699 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable refers to a missing class.
     [java] 2021-01-04 23:42:55,158 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.CFStatement refers to a missing class.
     [java] 2021-01-04 23:42:55,159 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.ModificationStatement$Parsed refers to a missing class.
     [java] 2021-01-04 23:42:55,162 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.CQLStatement refers to a missing class.
     [java] 2021-01-04 23:42:55,165 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.SchemaAlteringStatement refers to a missing class.
     [java] 2021-01-04 23:42:55,252 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.PermissionAlteringStatement refers to a missing class.
     [java] 2021-01-04 23:42:55,255 [main] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.AuthenticationStatement refers to a missing class.
     [java] 2021-01-04 23:42:56,912 [Fabric:7002:0] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.db.BlacklistedDirectoriesMBean refers to a missing class.
     [java] 2021-01-04 23:42:58,701 [Fabric:7002:0] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.net.async.connection.Connection$Message refers to a missing class.
     [java] 2021-01-04 23:42:59,178 [Fabric:7002:1] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.ThreadPoolsMonitorMBean refers to a missing class.
     [java] 2021-01-04 23:42:59,256 [Fabric:7002:1] INFO  co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.TracingAwareExecutorService refers to a missing class.

Originally, JMXEnabledThreadPoolExecutor class extends DebuggableThreadPoolExecutor, and DebuggableThreadPoolExecutor implements TracingAwareExecutorService interface. However, after bytecode transformation, DebuggableThreadPoolExecutor no longer implements TracingAwareExecutorService.

Analysis

This Java application uses two Cassandra clients of two different versions. These versions are not compatible with each other, but their packages / class names intersect. In order to resolve name conflicts, the classes of newer Cassandra are loaded by a separate ClassLoader from a location unreachable by the System ClassLoader. Meanwhile, the older Cassandra classes are on the application classpath (i.e. loaded by the System ClassLoader).

Elastic APM Agent instruments ThreadPoolExecutors by default. When instrumenting JMXEnabledThreadPoolExecutor of a newer Cassandra, it loads classes with a System ClassLoader instead of the custom ClassLoader used to load newer Cassandra client. This results in loading wrong classes.

Example

I’ve attached a reduced stadalone test case that demonstrates the wrong behavior: threadpooltest.zip

When running java -jar threadpooltest.zip, it prints

Test passed

But when running java -javaagent:/path/to/elastic-apm-agent-1.19.0.jar -jar threadpooltest.zip, the result is

Exception in thread "main" java.lang.AssertionError: Class hierarchy changed!
        at pool.ThreadPoolTest.run(ThreadPoolTest.java:8)
        at Main.main(Main.java:6)

The sources for the test: src.zip

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:16 (8 by maintainers)

github_iconTop GitHub Comments

3reactions
raphwcommented, Jan 14, 2021

Ok, I understand the problem now. You can argue, that the CustomClassLoader is incorrectly implemented but of course, this scenario should be handled correctly by Byte Buddy anyways. So here is the anatomy of the bug:

  • When CustomThreadPoolExecutor is loaded, this class is passed to the class file transformer first. While instrumenting, the agent checks whether the class is an instance of ExecutorService, this requires navigating the type hierarchy of the class. On the way, DebuggableThreadPoolExecutor needs to be resolved from the class loader by its class file since it is not legal to load classes during the instrumentation of another class (in overly simplified words). Byte Buddy invokes classLoader.getResourceAsStream("/pool/DebuggableThreadPoolExecutor.class") to locate the class file from the class loader. This is passed down to the system class loader since the CustomClassLoader does not handle resource lookups.
  • Byte Buddy is now satisfied with the type description and resolves the super type matcher, concluding that CustomThreadPoolExecutor is indeed a ExecutorService.
  • Next, DebuggableThreadPoolExecutor is instrumented. Byte Buddy was instructed to cache type description to avoid the duplicate class file processing. The description of DebuggableThreadPoolExecutor does now represent the system class loader’s representation, without the interface. When deserializing the description back to a byte array, it is now missing.

In a way, Byte Buddy is tricked into this behavior by resolving the incorrect class file for DebuggableThreadPoolExecutor which lets it believe that it already knows the correct description. But of course, ideally Byte Buddy would ignore any cached description for the currently instrumented class since the class file that is provided by the instrumentation API is guaranteed to be correct.

I will work on a fix for it; however, Byte Buddy can only read linked class files by querying the class loader and matchers might not be applied correctly if class files are misrepresenting actual features such as interfaces that are being implemented.

2reactions
raphwcommented, Jan 15, 2021

It does now work with the current Byte Buddy master snapshot. I just ran the example with it and it works. I will release this some time next week, I think, to allow for resolving this issue.

Read more comments on GitHub >

github_iconTop Results From Across the Web

bytecode instrumentation using ClassFileTransformer.transform
transform() method is not being invoked. I have observed that premain is being invoked. I have also observed that if I call Instrumentation....
Read more >
Java Bytecode Instrumentation Using Agent - SAP Blogs
Intro In this blog, I would like to describe one of techniques that can be used to flexibly change application logic executed by...
Read more >
Instrumentation (Java SE 13 & JDK 13 ) - Oracle Help Center
This class provides services needed to instrument Java programming language code. Instrumentation is the addition of byte-codes to methods for the purpose ...
Read more >
Embracing invokedynamic to tame class loaders in Java agents
The following example shows how the bootstrap method is declared within the bytecode of a class. The class that contains the bootstrap method...
Read more >
Bytecode manipulation with Javassist for fun and profit part I
Classes bytecode can be modified at runtime without an agent as long as the class has not been loaded yet by a classloader....
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found