Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Multiline events not flushing until next event occurs

See original GitHub issue

I have the following configuration:

  logs:
    my-app:
       from:
         pod: my-app
         container: my-app
       multiline:
         firstline: /^\d{4}-\d\d-\d\dT\d\d:\d\d:\d\d,\d{3}[+-]\d\d \[[a-zA-Z]+\] \[/
       sourcetype: app:log4j2

I am getting events with multi-lines as I expect. The problem is that the event is buffered and isn’t delivered until another event has started. The timestamp in splunk is the timestamp of when the next events starts buffering. For example, if event-a is delivered, then it buffers somewhere in fluentd. Let’s say 30 seconds pass, and then event-b comes into the log, at that point event-a is sent to Splunk with a timestamp with +30 seconds.

At first I thought perhaps something was wrong with the flush_interval for the concat plugin: In splunk-kubernetes-logging/templates/configMap.yaml, line 160: flush_interval {{ $logDef.multiline.flushInterval | default “5s” }} I thought this value is supposed to be 5 instead of 5s. See fluentd concat documentation. However, that update made no difference. Also, I can see that this flush does work. From the fluentd log:

2019-09-20 18:21:22 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.my-app-655555b857-jlknz_default_my-app-80fb6e9b4517fbb754758b2e821464384bd30a5e2ce4f538cd050ef4c3e1c281.log:stdout

So I can see fluentd saying the concat flush occurred, but the event does not get sent.

Issue Analytics

State:
Created 4 years ago
Comments:8 (5 by maintainers)

Top GitHub Comments

1reaction

nokesccommented, Sep 23, 2019

I found the root cause: the timeout handling and normal flow should target a separate label section for common processing. Please see the fluentd concat plugin documentation regarding Handle timeout log lines the same as normal logs

In splunk-kubernetes-logging/templates/configMap.yaml output.conf section, the timeout_label is @SPLUNK which is the label section currently executing. I created a new label section named @HEC and used that as the target of the timeout_label processing and the normal log processing by adding a relabel.

  output.conf: |-
    <label @SPLUNK>
      <filter tail.containers.var.log.containers.dns-controller*dns-controller*.log>
        @type concat
        key log
        timeout_label @HEC
        stream_identity_key stream
        multiline_start_regexp /^\w[0-1]\d[0-3]\d/
        flush_interval 5s
      </filter>
       ........
      <match  **>
        @type relabel
        @label @HEC 
      </match>
    </label>
    <label @HEC>
      <match **>
        @type splunk_hec
        .....

Once I did this, the concat timeout processing sent the event to the HEC label and was processed at the correct time.

0reactions

rockb1017commented, May 8, 2020

this fix has been merged https://github.com/splunk/splunk-connect-for-kubernetes/pull/369 and released as version 1.4.1. please reopen if it is still not resolved. Thank you!