Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inferring spans representing method executions based on a statistical profiler

See original GitHub issue

Motivation Our agents mostly capture I/O events like incoming and outgoing HTTP requests. So when most of the time s spend there, it’s easy to troubleshoot latency issues. However, if the application is slow because of inefficient code, users have to manually instrument the application. But in some cases, it’s not feasible to do that, for example, if the code base is huge and it’s unclear which methods cause the slowdown.

The trace_methods configuration option is often used to match a large portion, if not all methods within a codebase. The documentation warns about the fact that this can significantly increase the overhead and degrade the application’s performance.

The slowdown is inherent to the way trace_methods works. All methods which match the trace_methods expression are instrumented so that a timer is started at the start and stopped at the end of the method. If the execution time was significant enough (trace_methods_duration_threshold), a span is created. When frequently executing methods are instrumented, it can significantly slow down the application by adding just a little overhead to each invocation. Instrumenting the method can also hinder optimizations the JIT could normally do, like inlining.

The new approach This issue is about adding an alternative to trace_methods which does not require to instrument any methods. Instead, a sampling aka statistical profiler would be used as a foundation. These profilers work by gathering the stack trace of the application at frequent intervals, like every 20ms.

By correlating the stack traces with when which span has been active on which thread, we can create a call tree based on the stack traces, correlate them to a span and create spans for it. As the UI is concerned, those spans look just like regular spans so no changes are required in the UI and the APM Server. At a later stage, we could make the UI aware of the profiler-inferred spans and display them with a special icon or color.

The tradeoffs The duration of the spans won’t be as accurate as we’re not exactly measuring the execution time but rather estimate the duration based on the number of consecutive stack traces a method has been present.

To reduce the overhead, the profiler won’t be active all the time. Instead, it’s active for the first 10 seconds of every minute, by default. Only transactions that happen within a profiling session will have profiler-inferred spans.

Q&A

Will trace_methods be removed? There are no plans to remove that option as it can still be useful in combination or instead of profiler-inferred spans.
How can I try this out? Use a snapshot form https://github.com/elastic/apm-agent-java/pull/972 or build the https://github.com/felixbarny/apm-agent-java/tree/inferred-spans branch.
How does this relate to https://github.com/elastic/apm/issues/121?
- The CPU profiling proposal is about having a macro-level view of what the service is doing. This is great to find out about the hotspots of an application. Optimizing those can have a big overall effect on the application. However, it’s less useful to troubleshoot latency if the application is mostly idle waiting for I/O.
- This issue is about creating regular spans for long executing methods. It’s mostly used to troubleshoot latency for a specific instance of a transaction.
- There’s no intention to deprecate one in favor of the other. Both are very useful tools to have at your belt to optimize an application.
- Both can use the same underlying sampling profiler. For the CPU profiling, the stack trace is only considered for threads in a RUNNABLE thread state. Also, for CPU profiling, there will be only one flattened data structure, representing a flame graph for the whole profiling session.
Which settings are there to configure the profiler? See the docs preview

Screenshot Screen Shot 2019-12-04 at 16 07 52

Issue Analytics

State:
Created 4 years ago
Reactions:4
Comments:23 (16 by maintainers)

Top GitHub Comments

2reactions

felixbarnycommented, Jan 27, 2020

Update: I was able to integrate async-profiler with the agent 🎉

Async-profiler has much lower overhead than ThreadMXBean#getThreadInfo and does not rely on safepoints.

The biggest downside is that async-profiler does not work on Windows. I think it still makes sense to only support async-profiler and not have a fallback to ThreadMXBean#getThreadInfo on Windows. The time to reach a safe point (which means a stop-the-world pause for the application) is quite unpredictable and can regularly be as high as 5ms.

How it works in a nutshell:

The agent starts async-profiler in wallclock mode for 10s and lets it write to a JFR (Java Flight Recorder format) file.
During that profiling session, the agent captures span activation events and writes them to a ring buffer in a garbage-free way.
The events in the ring buffer are consumed by a thread that writes them into a memory-mapped file (MappedByteBuffer) which has a size of around 10mb which can hold around 100k activation events.
After the profiling session is over, the binary JFR file is consumed using a MappedByteBuffer (doesn’t consume heap memory proportional to the file size) and activation events are correlated with stack traces. This is mostly garbage free as well. The only source of allocations is we have to sort the execution sample events, consisting of a timestamp, threadId, and stackTraceId. We have to do this so we can correlate the already sorted activation events with the stack traces.

TODOs

Correlation with Java threads is not working seamlessly at the moment There’s a new, unreleased feature for that but there are some problems https://github.com/jvm-profiling-tools/async-profiler/issues/277#issuecomment-569662985

Gotchas

The actual sampling interval is lower than what’s provided to async-profiler. For example, when configuring async-profiler to take a thread snapshot of all threads every 5ms, the actual sampling rate is more like 25ms.