question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inferring spans representing method executions based on a statistical profiler

See original GitHub issue

Motivation Our agents mostly capture I/O events like incoming and outgoing HTTP requests. So when most of the time s spend there, it’s easy to troubleshoot latency issues. However, if the application is slow because of inefficient code, users have to manually instrument the application. But in some cases, it’s not feasible to do that, for example, if the code base is huge and it’s unclear which methods cause the slowdown.

The trace_methods configuration option is often used to match a large portion, if not all methods within a codebase. The documentation warns about the fact that this can significantly increase the overhead and degrade the application’s performance.

The slowdown is inherent to the way trace_methods works. All methods which match the trace_methods expression are instrumented so that a timer is started at the start and stopped at the end of the method. If the execution time was significant enough (trace_methods_duration_threshold), a span is created. When frequently executing methods are instrumented, it can significantly slow down the application by adding just a little overhead to each invocation. Instrumenting the method can also hinder optimizations the JIT could normally do, like inlining.

The new approach This issue is about adding an alternative to trace_methods which does not require to instrument any methods. Instead, a sampling aka statistical profiler would be used as a foundation. These profilers work by gathering the stack trace of the application at frequent intervals, like every 20ms.

By correlating the stack traces with when which span has been active on which thread, we can create a call tree based on the stack traces, correlate them to a span and create spans for it. As the UI is concerned, those spans look just like regular spans so no changes are required in the UI and the APM Server. At a later stage, we could make the UI aware of the profiler-inferred spans and display them with a special icon or color.

The tradeoffs The duration of the spans won’t be as accurate as we’re not exactly measuring the execution time but rather estimate the duration based on the number of consecutive stack traces a method has been present.

To reduce the overhead, the profiler won’t be active all the time. Instead, it’s active for the first 10 seconds of every minute, by default. Only transactions that happen within a profiling session will have profiler-inferred spans.

Q&A

  • Will trace_methods be removed? There are no plans to remove that option as it can still be useful in combination or instead of profiler-inferred spans.
  • How can I try this out? Use a snapshot form https://github.com/elastic/apm-agent-java/pull/972 or build the https://github.com/felixbarny/apm-agent-java/tree/inferred-spans branch.
  • How does this relate to https://github.com/elastic/apm/issues/121?
    • The CPU profiling proposal is about having a macro-level view of what the service is doing. This is great to find out about the hotspots of an application. Optimizing those can have a big overall effect on the application. However, it’s less useful to troubleshoot latency if the application is mostly idle waiting for I/O.
    • This issue is about creating regular spans for long executing methods. It’s mostly used to troubleshoot latency for a specific instance of a transaction.
    • There’s no intention to deprecate one in favor of the other. Both are very useful tools to have at your belt to optimize an application.
    • Both can use the same underlying sampling profiler. For the CPU profiling, the stack trace is only considered for threads in a RUNNABLE thread state. Also, for CPU profiling, there will be only one flattened data structure, representing a flame graph for the whole profiling session.
  • Which settings are there to configure the profiler? See the docs preview

Screenshot Screen Shot 2019-12-04 at 16 07 52

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:4
  • Comments:23 (16 by maintainers)

github_iconTop GitHub Comments

2reactions
felixbarnycommented, Jan 27, 2020

Update: I was able to integrate async-profiler with the agent 🎉

Async-profiler has much lower overhead than ThreadMXBean#getThreadInfo and does not rely on safepoints.

The biggest downside is that async-profiler does not work on Windows. I think it still makes sense to only support async-profiler and not have a fallback to ThreadMXBean#getThreadInfo on Windows. The time to reach a safe point (which means a stop-the-world pause for the application) is quite unpredictable and can regularly be as high as 5ms.

How it works in a nutshell:

  • The agent starts async-profiler in wallclock mode for 10s and lets it write to a JFR (Java Flight Recorder format) file.
  • During that profiling session, the agent captures span activation events and writes them to a ring buffer in a garbage-free way.
  • The events in the ring buffer are consumed by a thread that writes them into a memory-mapped file (MappedByteBuffer) which has a size of around 10mb which can hold around 100k activation events.
  • After the profiling session is over, the binary JFR file is consumed using a MappedByteBuffer (doesn’t consume heap memory proportional to the file size) and activation events are correlated with stack traces. This is mostly garbage free as well. The only source of allocations is we have to sort the execution sample events, consisting of a timestamp, threadId, and stackTraceId. We have to do this so we can correlate the already sorted activation events with the stack traces.

TODOs

Gotchas

  • The actual sampling interval is lower than what’s provided to async-profiler. For example, when configuring async-profiler to take a thread snapshot of all threads every 5ms, the actual sampling rate is more like 25ms.

I see you are using micro benchmarking (JMH), I’m interested in doing macro benchmarks to measure overhead. Would you like to see the results ?

Sure!

1reaction
felixbarnycommented, Jan 11, 2020
Read more comments on GitHub >

github_iconTop Results From Across the Web

Profiling configuration options - APM Java Agent - Elastic
Set to true to make the agent create spans for method executions based on async-profiler, a sampling aka statistical profiler.
Read more >
Chapter 4. The Three Pillars of Observability - O'Reilly
Figure 4-4. A trace represented as spans: span A is the root span, span B is a child of span A. Collecting this...
Read more >
Profile Inference Revisited - ACM Digital Library
Sampling- based profiling is the state-of-the-art technique for collecting execution profiles in data-center environments.
Read more >
XSP: Across-Stack Profiling and Analysis of Machine Learning ...
methodology, XSP copes with the profiling overhead and ... operation representing a piece of work is referred to as a span.
Read more >
Full article: Statistical Inference Enables Bad Science
However, formal, probability-based statistical inference should play ... The widespread use of statistical inference methods in scientific ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found