question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

HttpLogShipper starts hammering target after a certain amount of errors

See original GitHub issue

If sending log messages fails for prolonged periods of time (e.g. due to a wrong configuration or the target system being unreachable), the log shipper will start hammering the target every 2 seconds (in the default configuration).

This is due to an overflow in ExponentialBackoffConnectionSchedule#NextInterval, where the cast in line 61 will overflow. In the default configuration with a period of 2 seconds backoffPeriod will be 50000000 ticks (as per minimum backoff period of 5 seconds). When the number of errors reaches 39, the backoffFactor becomes 2^38. The expression var backedOff = (long)(backoffPeriod * backoffFactor) becomes (long)(2^38 * 50000000), which is greater than (long)(2^38 * 2^25) (50000000 being > 2^25) or greater than 2^63. The cast to long overflows and gives -9223372036854775808. Following through, line 67 results in the actual backoff being the base period (2 seconds in the default configuration). From that point on, this happens every time NextInterface.get is called.

To Reproduce

  1. Configure the HTTP sink to log to an invalid target (e.g. a computer without a server running).
  2. Emit a log message and wait for 39 retries (roughly 6 hours, could be shortened by changing the MaximumBackoffInterval to something shorter, e.g. a few seconds).
  3. See HttpLogShippper retrying every 2 seconds (in the default configuration).

Expected behavior The retransmission timeout should stay capped to 10 minutes, even if the messages cannot be sent for prolonged periods of time (e.g. by capping failuresSinceSuccessfulConnection to something reasonably small, so that the exponential function does not produce excessively large values).

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Comments:6 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
ghostcommented, Feb 13, 2020

Thanks for the prompt reaction. We’re good. We just told our clients to get their sh*t together 😉 The issue starts filling up the log files after 6 hours of having a wrong configuration, so most of the time people start noticing way before, that they are not getting any remote log messages. This one actually popped up on a test system, which nobody paid any real attention to, so nothing critical.

0reactions
FantasticFiascocommented, Feb 21, 2020

Thanks for reporting it. Best of luck to you in the future!

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found