HttpLogShipper starts hammering target after a certain amount of errorsSee original GitHub issue
If sending log messages fails for prolonged periods of time (e.g. due to a wrong configuration or the target system being unreachable), the log shipper will start hammering the target every 2 seconds (in the default configuration).
This is due to an overflow in
ExponentialBackoffConnectionSchedule#NextInterval, where the cast in line 61 will overflow. In the default configuration with a period of 2 seconds
backoffPeriod will be 50000000 ticks (as per minimum backoff period of 5 seconds). When the number of errors reaches 39, the
backoffFactor becomes 2^38. The expression
var backedOff = (long)(backoffPeriod * backoffFactor) becomes
(long)(2^38 * 50000000), which is greater than
(long)(2^38 * 2^25) (50000000 being > 2^25) or greater than 2^63. The cast to
long overflows and gives -9223372036854775808. Following through, line 67 results in the actual backoff being the base period (2 seconds in the default configuration). From that point on, this happens every time
NextInterface.get is called.
- Configure the HTTP sink to log to an invalid target (e.g. a computer without a server running).
- Emit a log message and wait for 39 retries (roughly 6 hours, could be shortened by changing the
MaximumBackoffIntervalto something shorter, e.g. a few seconds).
HttpLogShippperretrying every 2 seconds (in the default configuration).
The retransmission timeout should stay capped to 10 minutes, even if the messages cannot be sent for prolonged periods of time (e.g. by capping
failuresSinceSuccessfulConnection to something reasonably small, so that the exponential function does not produce excessively large values).
- Created 4 years ago
- Comments:6 (4 by maintainers)
Top GitHub Comments
Thanks for the prompt reaction. We’re good. We just told our clients to get their sh*t together 😉 The issue starts filling up the log files after 6 hours of having a wrong configuration, so most of the time people start noticing way before, that they are not getting any remote log messages. This one actually popped up on a test system, which nobody paid any real attention to, so nothing critical.
Thanks for reporting it. Best of luck to you in the future!