Not Seeing Metrics Data in Splunk Cloud Index
See original GitHub issueWhat happened: Not seeing metrics data in Splunk Cloud when implementing Splunk Connect for Kubernetes. We are however able to see log and object data. Seeing the following in the metrics pod logs:
2019-07-23 15:44:57 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true>
2019-07-23 15:45:12 +0000 [debug]: #0 Received new chunk, size=1046393
2019-07-23 15:45:12 +0000 [debug]: #0 Sending 1046393 bytes to Splunk.
2019-07-23 15:45:23 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true>
2019-07-23 15:45:26 +0000 [debug]: #0 Received new chunk, size=1046240
2019-07-23 15:45:26 +0000 [debug]: #0 Sending 1046240 bytes to Splunk.
2019-07-23 15:45:27 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true>
2019-07-23 15:45:40 +0000 [debug]: #0 Received new chunk, size=1047250
2019-07-23 15:45:40 +0000 [debug]: #0 Sending 1047250 bytes to Splunk.
2019-07-23 15:45:48 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true>
2019-07-23 15:45:57 +0000 [debug]: #0 Received new chunk, size=1047460
2019-07-23 15:45:57 +0000 [debug]: #0 Sending 1047460 bytes to Splunk.
2019-07-23 15:46:07 +0000 [debug]: #0 taking back chunk for errors. chunk=“58e5b164f15e9a50596c723c5ca003d1”
2019-07-23 15:46:07 +0000 [warn]: #0 failed to flush the buffer. retry_time=0 next_retry_seconds=2019-07-23 15:46:08 +0000 chunk=“58e5b164f15e9a50596c723c5ca003d1” error_class=SocketError error=“Failed to open TCP connection to http-inputs-<DEDACTED>.splunkcloud.com:443 (getaddrinfo: Try again)”
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:939:in rescue in block in connect' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:936:in
block in connect’
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/timeout.rb:93:in block in timeout' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/timeout.rb:103:in
timeout’
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:935:in connect' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:1541:in
begin_transport’
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:1493:in transport_request' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/2.5.0/net/http.rb:1467:in
request’
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/net-http-persistent-3.0.1/lib/net/http/persistent.rb:951:in block in request' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/net-http-persistent-3.0.1/lib/net/http/persistent.rb:648:in
connection_for’
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/net-http-persistent-3.0.1/lib/net/http/persistent.rb:945:in request' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-splunk-hec-1.1.0/lib/fluent/plugin/out_splunk_hec.rb:355:in
send_to_hec’
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluent-plugin-splunk-hec-1.1.0/lib/fluent/plugin/out_splunk_hec.rb:167:in write' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:1125:in
try_flush’
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:1425:in flush_thread_run' 2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin/output.rb:454:in
block (2 levels) in start’
2019-07-23 15:46:07 +0000 [warn]: #0 /usr/lib/ruby/gems/2.5.0/gems/fluentd-1.4.0/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create’
2019-07-23 15:46:08 +0000 [debug]: #0 Received new chunk, size=1047460
2019-07-23 15:46:08 +0000 [debug]: #0 Sending 1047460 bytes to Splunk.
2019-07-23 15:46:11 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true>
2019-07-23 15:46:11 +0000 [warn]: #0 retry succeeded. chunk_id=“58e5b164f15e9a50596c723c5ca003d1”
2019-07-23 15:46:12 +0000 [debug]: #0 Received new chunk, size=1123692
2019-07-23 15:46:12 +0000 [debug]: #0 Sending 1123692 bytes to Splunk.
2019-07-23 15:46:12 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true>
2019-07-23 15:46:25 +0000 [debug]: #0 Received new chunk, size=1130330
2019-07-23 15:46:25 +0000 [debug]: #0 Sending 1130330 bytes to Splunk.
2019-07-23 15:46:28 +0000 [debug]: #0 [Response] POST https://http-inputs-<redacted>.splunkcloud.com/services/collector: #<Net::HTTPOK 200 OK readbody=true>
What you expected to happen:
It looks like from the metrics pod logs that some posts to Splunk Cloud work and others do not. We are also not seeing any data in the metrics index in Splunk Could. Since the HEC connection is a global setting, is there a setting we need to add to the metrics collector? Is there something specific with posts that do not work that needs to be accounted for in the Helm template?
How to reproduce it (as minimally and precisely as possible):
Install Splunk Connect for Kubernetes with Helm with the following values yaml:
global: logLevel: debug splunk: hec: protocol: https insecureSSL: false host: http-inputs-<redacted>.splunkcloud.com token: <redacted> port: 443 splunk-kubernetes-logging: journalLogPath: /run/log/journal splunk: hec: indexName: <redacted>-log splunk-kubernetes-objects: rbac: create: true serviceAccount: create: true name: splunk-kubernetes-objects kubernetes: insecureSSL: false objects: core: v1: - name: pods interval: 30s - name: namespaces interval: 30s - name: nodes interval: 30s - name: services interval: 30s - name: config_maps interval: 30s - name: persistent_volumes interval: 30s - name: service_accounts interval: 30s - name: persistent_volume_claims interval: 30s - name: resource_quotas interval: 30s - name: component_statuses interval: 30s - name: events mode: watch apps: v1: - name: deployments interval: 30s - name: daemon_sets interval: 30s - name: replica_sets interval: 30s - name: stateful_sets interval: 30s splunk: hec: indexName: <redacted>-objects splunk-kubernetes-metrics: rbac: create: true serviceAccount: create: true name: splunk-kubernetes-metrics splunk: hec: indexName: <redacted>-metrics
Helm command - helm install --name k8-connect splunk-connect-for-kubernetes-1.2.0.tgz --namespace test3 -f values.yaml
Anything else we need to know?:
Environment:
Kubernetes version (use kubectl version):1.13-EKS OS (e.g: cat /etc/os-release):amazonlinux Splunk version: Others:
Issue Analytics
- State:
- Created 4 years ago
- Comments:10 (2 by maintainers)
looks like hec config
let’s doublecheck this:
Is your splunk cloud endpoint http or https? also is the url broken in the configmap for the pod?
try
kubectl desccribe cm <splunk-kubernetes-metrics-configmap-name-here>
Interestingly enough, the issue, so far, seems to be environment related. I replaced HEC Host DNS with one of the IPs in the DNS record and I don’t see the issue anymore. I am now working to recreate in one of our other K8 environments to see if the issue happens in the new environment.