question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

deadlock; recursive locking

See original GitHub issue

What happened:

I am pretty new to splunk and deployed splunk connect for kubernetes(1.4.0) in the cluster and see the below error in the agent that runs on the master server. Kube API server logs are not pushed to the splunk server.

2020-03-23 08:31:30 +0000 [warn]: #0 dump an error event: error_class=ThreadError error=“deadlock; recursive locking” location=“/usr/share/gems/gems/fluent-plugin-concat-2.4.0/lib/fluent/plugin/filter_concat.rb:189:in `synchronize’” tag=“tail.containers.var.log.containers.kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log” time=2020-03-23 08:31:25.150968430 +0000 record={“log”=>“I0323 08:31:25.150846 1 controller.go:107] OpenAPI AggregationController: Processing item v1beta1.metrics.k8s.io\n”, “stream”=>“stderr”, “source”=>“/var/log/containers/kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log”}

2020-03-23 08:31:30 +0000 [info]: #0 Timeout flush: tail.containers.var.log.containers.kube-apiserver-k8s1m_kube-system_kube-apiserver-f71d1b0e611b1f82d45637a2aaae75b5a0849b966bab165f8fa3078194b55b1a.log:stderr

What you expected to happen:

API server logs gets pushed to the splunk server

Anything else we need to know?:

Application logs from other containers are being pushed to the splunk server.

Environment:

  • Kubernetes version (use kubectl version): 1,15,4
  • Ruby version (use ruby --version): ruby 2.5.5p157
  • OS (e.g: cat /etc/os-release): Ubuntu 18.04.2 LTS
  • Splunk version: 8,0,1
  • SCK Version - 1,4,0

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Comments:9

github_iconTop GitHub Comments

2reactions
matthewmodestinocommented, Mar 30, 2020

i was able to workaround this by following the thread i posted above over on the concat repo.

I started by removing all the shipped concat filters from the logging and umbrella chart values.yaml, (these should be optional anyways) then updating configmap sources that we need concat for to point to @label CONCAT (containers and files sources currently allow you to set concat)

<source>
      @id containers.log
      @type tail
      @label @CONCAT
      tag tail.containers.*
      path {{ .Values.fluentd.path | default "/var/log/containers/*.log" }}
      {{- if .Values.fluentd.exclude_path }}
      exclude_path {{ .Values.fluentd.exclude_path | toJson }}
      {{- end }}
      pos_file /var/log/splunk-fluentd-containers.log.pos
      path_key source
      read_from_head true
      <parse>
      {{- if eq .Values.containers.logFormatType "cri" }}
        @type regexp
        expression /^(?<time>.+) (?<stream>stdout|stderr) [^ ]* (?<log>.*)$/
        time_format  {{ .Values.containers.logFormat | default "%Y-%m-%dT%H:%M:%S.%N%:z" }}
      {{- else if eq .Values.containers.logFormatType "json" }}
        @type json
        time_format %Y-%m-%dT%H:%M:%S.%NZ
      {{- end }}
        time_key time
        time_type string
        localtime false
      </parse>
    </source>

Then moved the concat filter outside the @SPLUNK label in output.conf

 <label @CONCAT>
     # = filters for container logs =
      {{- range $name, $logDef := .Values.logs }}
      {{- if and $logDef.from.pod $logDef.multiline }}
      <filter tail.containers.var.log.containers.{{ $logDef.from.pod }}*{{ or $logDef.from.container $name }}*.log>
        @type concat
        key log
        timeout_label @SPLUNK
        stream_identity_key stream
        multiline_start_regexp {{ $logDef.multiline.firstline }}
        flush_interval {{ $logDef.multiline.flushInterval | default "2s" }}
        separator "\n"
        use_first_timestamp true
      </filter>
      {{- end }}
      {{- end }}

      <match **>
        @type relabel
        @label @SPLUNK
      </match>
    </label>
    <label @SPLUNK>
      # Enrich log with k8s metadata
      <filter tail.containers.**>
        @type kubernetes_metadata
        annotation_match [ "^splunk\.com" ]
        de_dot false
      </filter>
      <filter tail.containers.**>
        @type record_transformer
        enable_ruby
        <record>
...
...

Because not all logs need the same concat settings (ie. separator, etc - in the example above my pod needs separator “\n”, some don’t), I believe we need to expose more multiline settings in the helm chart instead of rendering all concat filters with the same settings block.

So what we need to solve this:

  • Correct the configmap to update source.containers.conf & source.files.conf to point to a new label called ie. CONCAT which contains the multiline logic in output.conf
  • Expose more multiline settings in the helm chart, especially separator as not all concat rules need the same treatement.
  • Remove or make our shipped concat filters optional
0reactions
matthewmodestinocommented, Apr 10, 2020

@szymonpk PR is in for review. once the team has a chance to take a look i will add another to expose the “separator” option…and will look for any others we think we need to make multiline logs pretty.

Read more comments on GitHub >

github_iconTop Results From Across the Web

what causes 'deadlock; recursive locking' error in a Rails app?
I'm especially confused because we run only a single worker. Our setup: Rails 3.2.12, Heroku app, Postgres, several web dynos but only 1...
Read more >
Avoiding Deadlock
The most common error that causes deadlock is self deadlock or recursive deadlock. In a self deadlock or recursive deadlock, a thread tries...
Read more >
ruby-2.1.7: kernel_gem.rb:67:in `synchronize': deadlock ...
ruby-2.1.7: kernel_gem.rb:67:in `synchronize': deadlock; recursive locking (ThreadError) #2137.
Read more >
ThreadError: deadlock; recursive locking - GitLab.org
An error occurred while fetching the assigned iteration of the selected issue. Closed.
Read more >
The case of the recursively-acquired non-recursive lock, and ...
A customer encountered a deadlock due to unexpected reentrancy, and they were looking for guidance in fixing it. Here's the code in question ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found