Bytecode instrumentation does not respect ClassLoader
See original GitHub issueAfter adding elastic-apm-agent-1.19.0 to a Java application, I start getting the following exception:
java.lang.ClassCastException: org.apache.cassandra.concurrent.JMXEnabledThreadPoolExecutor cannot be cast to org.apache.cassandra.concurrent.TracingAwareExecutorService
at org.apache.cassandra.concurrent.StageManager.getStage(StageManager.java:158)
at org.apache.cassandra.net.MessagingService.receive(MessagingService.java:582)
at org.apache.cassandra.net.async.MeteredMessageInConsumer.receive(MeteredMessageInConsumer.java:41)
at org.apache.cassandra.net.async.connection.In.doPayload(In.java:395)
at org.apache.cassandra.net.async.connection.In.receive(In.java:438)
at org.apache.cassandra.net.async.connection.In.receive(In.java:50)
at one.actor.Actor.run(Actor.java:82)
at one.actor.Dispatcher.execute(Dispatcher.java:54)
at one.actor.Actor.tryScheduleToExecute(Actor.java:223)
at one.actor.Actor.enqueue(Actor.java:129)
at org.apache.cassandra.net.async.connection.In.access$600(In.java:50)
at org.apache.cassandra.net.async.connection.In$1.completed(In.java:214)
at org.apache.cassandra.net.async.connection.In$1.completed(In.java:181)
at sun.nio.ch.Invoker.invokeUnchecked(Invoker.java:126)
at sun.nio.ch.Invoker$2.run(Invoker.java:218)
at sun.nio.ch.AsynchronousChannelGroupImpl$1.run(AsynchronousChannelGroupImpl.java:112)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
The only difference is one JVM option: -javaagent:/path/to/elastic-apm-agent-1.19.0.jar
. The application works fine without this option.
JDK 8u265, Linux or Windows.
The logs from the agent:
[java] 2021-01-04 23:41:24,854 [main] INFO co.elastic.apm.agent.util.JmxUtils - Found JVM-specific OperatingSystemMXBean interface: com.sun.management.OperatingSystemMXBean
[java] 2021-01-04 23:41:25,083 [main] INFO co.elastic.apm.agent.configuration.StartupInfo - Starting Elastic APM 1.19.0 as Launcher on Java 1.8.0_265 Runtime version: 1.8.0_265-b01 VM version: 25.265-b01 (BellSoft) Windows 10 10.0
[java] 2021-01-04 23:41:37,125 [main] INFO co.elastic.apm.agent.impl.ElasticApmTracer - Tracer switched to RUNNING state
[java] 2021-01-04 23:41:39,340 [elastic-apm-server-healthcheck] WARN co.elastic.apm.agent.report.ApmServerHealthChecker - Elastic APM server http://localhost:8200/ is not available (Connection refused: connect)
[java] 2021-01-04 23:41:41,369 [elastic-apm-remote-config-poller] ERROR co.elastic.apm.agent.configuration.ApmServerConfigurationSource - Connection refused: connect
[java] 2021-01-04 23:42:09,260 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error trying to connect to APM Server. Some details about SSL configurations corresponding the current connection are logged at INFO level.
[java] 2021-01-04 23:42:09,307 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Failed to handle event of type JSON_WRITER with this error: Connection refused: connect
[java] 2021-01-04 23:42:09,308 [elastic-apm-server-reporter] INFO co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 0 seconds (+/-10%)
[java] 2021-01-04 23:42:39,208 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Error trying to connect to APM Server. Some details about SSL configurations corresponding the current connection are logged at INFO level.
[java] 2021-01-04 23:42:39,209 [elastic-apm-server-reporter] ERROR co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Failed to handle event of type JSON_WRITER with this error: Connection refused: connect
[java] 2021-01-04 23:42:39,209 [elastic-apm-server-reporter] INFO co.elastic.apm.agent.report.IntakeV2ReportingEventHandler - Backing off for 1 seconds (+/-10%)
[java] 2021-01-04 23:42:53,844 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor refers to a missing class.
[java] 2021-01-04 23:42:53,848 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.JMXEnabledScheduledThreadPoolExecutorMBean refers to a missing class.
[java] 2021-01-04 23:42:54,699 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.DebuggableScheduledThreadPoolExecutor$UncomplainingRunnable refers to a missing class.
[java] 2021-01-04 23:42:55,158 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.CFStatement refers to a missing class.
[java] 2021-01-04 23:42:55,159 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.ModificationStatement$Parsed refers to a missing class.
[java] 2021-01-04 23:42:55,162 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.CQLStatement refers to a missing class.
[java] 2021-01-04 23:42:55,165 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.SchemaAlteringStatement refers to a missing class.
[java] 2021-01-04 23:42:55,252 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.PermissionAlteringStatement refers to a missing class.
[java] 2021-01-04 23:42:55,255 [main] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.cql3.statements.AuthenticationStatement refers to a missing class.
[java] 2021-01-04 23:42:56,912 [Fabric:7002:0] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.db.BlacklistedDirectoriesMBean refers to a missing class.
[java] 2021-01-04 23:42:58,701 [Fabric:7002:0] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.net.async.connection.Connection$Message refers to a missing class.
[java] 2021-01-04 23:42:59,178 [Fabric:7002:1] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.ThreadPoolsMonitorMBean refers to a missing class.
[java] 2021-01-04 23:42:59,256 [Fabric:7002:1] INFO co.elastic.apm.agent.bci.bytebuddy.ErrorLoggingListener - org.apache.cassandra.concurrent.TracingAwareExecutorService refers to a missing class.
Originally, JMXEnabledThreadPoolExecutor
class extends DebuggableThreadPoolExecutor
, and DebuggableThreadPoolExecutor
implements TracingAwareExecutorService
interface. However, after bytecode transformation, DebuggableThreadPoolExecutor
no longer implements TracingAwareExecutorService
.
Analysis
This Java application uses two Cassandra clients of two different versions. These versions are not compatible with each other, but their packages / class names intersect. In order to resolve name conflicts, the classes of newer Cassandra are loaded by a separate ClassLoader from a location unreachable by the System ClassLoader. Meanwhile, the older Cassandra classes are on the application classpath (i.e. loaded by the System ClassLoader).
Elastic APM Agent instruments ThreadPoolExecutors by default. When instrumenting JMXEnabledThreadPoolExecutor
of a newer Cassandra, it loads classes with a System ClassLoader instead of the custom ClassLoader used to load newer Cassandra client. This results in loading wrong classes.
Example
I’ve attached a reduced stadalone test case that demonstrates the wrong behavior: threadpooltest.zip
When running java -jar threadpooltest.zip
, it prints
Test passed
But when running java -javaagent:/path/to/elastic-apm-agent-1.19.0.jar -jar threadpooltest.zip
, the result is
Exception in thread "main" java.lang.AssertionError: Class hierarchy changed!
at pool.ThreadPoolTest.run(ThreadPoolTest.java:8)
at Main.main(Main.java:6)
The sources for the test: src.zip
Issue Analytics
- State:
- Created 3 years ago
- Reactions:1
- Comments:16 (8 by maintainers)
Top GitHub Comments
Ok, I understand the problem now. You can argue, that the
CustomClassLoader
is incorrectly implemented but of course, this scenario should be handled correctly by Byte Buddy anyways. So here is the anatomy of the bug:CustomThreadPoolExecutor
is loaded, this class is passed to the class file transformer first. While instrumenting, the agent checks whether the class is an instance ofExecutorService
, this requires navigating the type hierarchy of the class. On the way,DebuggableThreadPoolExecutor
needs to be resolved from the class loader by its class file since it is not legal to load classes during the instrumentation of another class (in overly simplified words). Byte Buddy invokesclassLoader.getResourceAsStream("/pool/DebuggableThreadPoolExecutor.class")
to locate the class file from the class loader. This is passed down to the system class loader since theCustomClassLoader
does not handle resource lookups.CustomThreadPoolExecutor
is indeed aExecutorService
.DebuggableThreadPoolExecutor
is instrumented. Byte Buddy was instructed to cache type description to avoid the duplicate class file processing. The description ofDebuggableThreadPoolExecutor
does now represent the system class loader’s representation, without the interface. When deserializing the description back to a byte array, it is now missing.In a way, Byte Buddy is tricked into this behavior by resolving the incorrect class file for
DebuggableThreadPoolExecutor
which lets it believe that it already knows the correct description. But of course, ideally Byte Buddy would ignore any cached description for the currently instrumented class since the class file that is provided by the instrumentation API is guaranteed to be correct.I will work on a fix for it; however, Byte Buddy can only read linked class files by querying the class loader and matchers might not be applied correctly if class files are misrepresenting actual features such as interfaces that are being implemented.
It does now work with the current Byte Buddy master snapshot. I just ran the example with it and it works. I will release this some time next week, I think, to allow for resolving this issue.