Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Not Seeing Metrics Data in Splunk Cloud Index

See original GitHub issue

What happened: Not seeing metrics data in Splunk Cloud when implementing Splunk Connect for Kubernetes. We are however able to see log and object data. Seeing the following in the metrics pod logs:

2019-07-23 15:44:57 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true> 2019-07-23 15:45:12 +0000 [debug]: #0 Received new chunk, size=1046393 2019-07-23 15:45:12 +0000 [debug]: #0 Sending 1046393 bytes to Splunk. 2019-07-23 15:45:23 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true> 2019-07-23 15:45:26 +0000 [debug]: #0 Received new chunk, size=1046240 2019-07-23 15:45:26 +0000 [debug]: #0 Sending 1046240 bytes to Splunk. 2019-07-23 15:45:27 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true> 2019-07-23 15:45:40 +0000 [debug]: #0 Received new chunk, size=1047250 2019-07-23 15:45:40 +0000 [debug]: #0 Sending 1047250 bytes to Splunk. 2019-07-23 15:45:48 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true> 2019-07-23 15:45:57 +0000 [debug]: #0 Received new chunk, size=1047460 2019-07-23 15:45:57 +0000 [debug]: #0 Sending 1047460 bytes to Splunk. 2019-07-23 15:46:07 +0000 [debug]: #0 taking back chunk for errors. chunk=“58e5b164f15e9a50596c723c5ca003d1” 2019-07-23 15:46:07 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-23 15:46:08 +0000 chunk=“58e5b164f15e9a50596c723c5ca003d1” error_class=SocketError error=“Failed to open TCP connection to http-inputs-<DEDACTED>.splunkcloud.com:443 (getaddrinfo: Try again)” 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:939:in rescue in block in connect' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:936:in block in connect’ 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/timeout.rb:93:in block in timeout' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/timeout.rb:103:in timeout’ 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:935:in connect' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:1541:in begin_transport’ 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:1493:in transport_request' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:1467:in request’ 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/net-http-persistent-3.0.1/lib/net/http/persistent.rb:951:in block in request' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/net-http-persistent-3.0.1/lib/net/http/persistent.rb:648:in connection_for’ 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/net-http-persistent-3.0.1/lib/net/http/persistent.rb:945:in request' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-splunk-hec-1.1.0/lib/fluent/plugin/out_splunk_hec.rb:355:in send_to_hec’ 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-splunk-hec-1.1.0/lib/fluent/plugin/out_splunk_hec.rb:167:in write' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:1125:in try_flush’ 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:1425:in flush_thread_run' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:454:in block (2 levels) in start’ 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create’ 2019-07-23 15:46:08 +0000 [debug]: #0 Received new chunk, size=1047460 2019-07-23 15:46:08 +0000 [debug]: #0 Sending 1047460 bytes to Splunk. 2019-07-23 15:46:11 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true> 2019-07-23 15:46:11 +0000 [warn]: #0 retry succeeded. chunk_id=“58e5b164f15e9a50596c723c5ca003d1” 2019-07-23 15:46:12 +0000 [debug]: #0 Received new chunk, size=1123692 2019-07-23 15:46:12 +0000 [debug]: #0 Sending 1123692 bytes to Splunk. 2019-07-23 15:46:12 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true> 2019-07-23 15:46:25 +0000 [debug]: #0 Received new chunk, size=1130330 2019-07-23 15:46:25 +0000 [debug]: #0 Sending 1130330 bytes to Splunk. 2019-07-23 15:46:28 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true>

What you expected to happen:

It looks like from the metrics pod logs that some posts to Splunk Cloud work and others do not. We are also not seeing any data in the metrics index in Splunk Could. Since the HEC connection is a global setting, is there a setting we need to add to the metrics collector? Is there something specific with posts that do not work that needs to be accounted for in the Helm template?

How to reproduce it (as minimally and precisely as possible):

Install Splunk Connect for Kubernetes with Helm with the following values yaml:

global: logLevel: debug splunk: hec: protocol: https insecureSSL: false host: http-inputs-<redacted>.splunkcloud.com token: <redacted> port: 443 splunk-kubernetes-logging: journalLogPath: /run/log/journal splunk: hec: indexName: <redacted>-log splunk-kubernetes-objects: rbac: create: true serviceAccount: create: true name: splunk-kubernetes-objects kubernetes: insecureSSL: false objects: core: v1: - name: pods interval: 30s - name: namespaces interval: 30s - name: nodes interval: 30s - name: services interval: 30s - name: config_maps interval: 30s - name: persistent_volumes interval: 30s - name: service_accounts interval: 30s - name: persistent_volume_claims interval: 30s - name: resource_quotas interval: 30s - name: component_statuses interval: 30s - name: events mode: watch apps: v1: - name: deployments interval: 30s - name: daemon_sets interval: 30s - name: replica_sets interval: 30s - name: stateful_sets interval: 30s splunk: hec: indexName: <redacted>-objects splunk-kubernetes-metrics: rbac: create: true serviceAccount: create: true name: splunk-kubernetes-metrics splunk: hec: indexName: <redacted>-metrics

Helm command - helm install --name k8-connect splunk-connect-for-kubernetes-1.2.0.tgz --namespace test3 -f values.yaml

Anything else we need to know?:

Environment:

Kubernetes version (use kubectl version):1.13-EKS OS (e.g: cat /etc/os-release):amazonlinux Splunk version: Others:

Issue Analytics

State:
Created 4 years ago
Comments:10 (2 by maintainers)

Top GitHub Comments

1reaction

matthewmodestinocommented, Jul 23, 2019

looks like hec config

let’s doublecheck this:

2019-07-23 15:46:07 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-23 15:46:08 +0000 chunk="58e5b164f15e9a50596c723c5ca003d1" error_class=SocketError error="Failed to open TCP connection to http-inputs-.splunkcloud.com:443 (getaddrinfo: Try again)"

Is your splunk cloud endpoint http or https? also is the url broken in the configmap for the pod?

try kubectl desccribe cm <splunk-kubernetes-metrics-configmap-name-here>

0reactions

BobWieberdinkcommented, Jul 29, 2019

Interestingly enough, the issue, so far, seems to be environment related. I replaced HEC Host DNS with one of the IPs in the DNS record and I don’t see the issue anymore. I am now working to recreate in one of our other K8 environments to see if the issue happens in the new environment.