fail to flush the buffer due to "too many connection resets"
See original GitHub issueWhat happened: I found in the logs , it failed to flush the buffer frequently due to “too many connection resets”
2020-03-16 13:19:56 +0000 [warn]: #0 failed to flush the buffer. retry_time=1 next_retry_seconds=2020-03-16 13:21:01 +0000 chunk=“5a0f8a89e26f39ee8d09457092bcfae5” error_class=Net::HTTP::Persistent::Error error=“too many connection resets (due to Net::ReadTimeout - Net::ReadTimeout) after 0 requests on 70098224503720, last used 1584364796.8142624 seconds ago”
What you expected to happen:
No “failed to flush the buffer” should happen
How to reproduce it (as minimally and precisely as possible):
Apply splunk-connect-for-kubernetes 1.3.0 to openshift by HELM
Anything else we need to know?: My buffer setting buffer: ‘@type’: memory total_limit_size: 4000m chunk_limit_size: 8m chunk_limit_records: 10000 flush_at_shutdown: true flush_interval: 3s flush_thread_count: 10 flush_thread_interval: 0.1 flush_thread_burst_interval: 0.01 overflow_action: block retry_forever: true retry_wait: 60 compress: gzip
I did not see any related logs in splunk master node.
Some pods generate about 1.5 million logs within one hour. And developers want to monitor these logs real-time. I am not that familiar with fluentd, but it should able to handle 3K logs per seconds, right?
Environment:
- Kubernetes version (use
kubectl version
): $ kubectl version Client Version: version.Info{Major:“1”, Minor:“11+”, GitVersion:“v1.11.0+d4cacc0”, GitCommit:“d4cacc0”, GitTreeState:“clean”, BuildDate:“2019-06-09T23:23:08Z”, GoVersion:“go1.10.8”, Compiler:“gc”, Platform:“linux/amd64”} Server Version: version.Info{Major:“1”, Minor:“11+”, GitVersion:“v1.11.0+d4cacc0”, GitCommit:“d4cacc0”, GitTreeState:“clean”, BuildDate:“2019-04-10T17:49:11Z”, GoVersion:“go1.10.8”, Compiler:“gc”, Platform:“linux/amd64”}
$ oc version oc v3.11.117 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO
Server https://internal-master.ocp.local:443 openshift v3.11.104 kubernetes v1.11.0+d4cacc0
- Ruby version (use
ruby --version
): - OS (e.g:
cat /etc/os-release
): NAME=“Red Hat Enterprise Linux Server” VERSION=“7.7 (Maipo)” ID=“rhel” ID_LIKE=“fedora” VARIANT=“Server” VARIANT_ID=“server” VERSION_ID=“7.7” PRETTY_NAME=“Red Hat Enterprise Linux” ANSI_COLOR=“0;31” CPE_NAME=“cpe:/o:redhat:enterprise_linux:7.7:GA:server” HOME_URL=“https://www.redhat.com/” BUG_REPORT_URL=“https://bugzilla.redhat.com/”
REDHAT_BUGZILLA_PRODUCT=“Red Hat Enterprise Linux 7” REDHAT_BUGZILLA_PRODUCT_VERSION=7.7 REDHAT_SUPPORT_PRODUCT=“Red Hat Enterprise Linux” REDHAT_SUPPORT_PRODUCT_VERSION="7.7
-
Splunk version: Splunk Enterprise Version: 7.2.3
-
Others:
Issue Analytics
- State:
- Created 4 years ago
- Reactions:3
- Comments:7 (1 by maintainers)
In my case, the default index was set to “main”, which the HEC token had no write permissions on it. So setting the default index to an existing, writeable index, sovlved the problem.
This issue was closed because it has been inactive for 14 days since being marked as stale.