question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

AES need a lot of restart before it is going to running state.

See original GitHub issue

Describe the bug After the AES deployment on an Azure AKS cluster aks… Ready agent v1.13.12 4.15.0-1082-azure docker://3.0.10+azure

during the ambassador-… pods starting there is a python error in the logs and get 500 internal server error from the /diag/.

2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] INFO: 7C716B56-786E-426C-B03C-96554F086837: 127.0.0.1 "GET /ambassador/v0/diag/" START 2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] ERROR: 'NoneType' object has no attribute 'overview' Traceback (most recent call last): File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 233, in wrapper result = f(*args, reqid=reqid, **kwds) File "/usr/lib/python3.7/site-packages/ambassador-0.0.0.dev0-py3.7.egg/ambassador_diag/diagd.py", line 519, in show_overview ov = diag.overview(request, app.estats) AttributeError: 'NoneType' object has no attribute 'overview' 2020-05-07 09:36:28 diagd 1.4.2 [P219TThreadPoolExecutor-0_4] ERROR: 7C716B56-786E-426C-B03C-96554F086837: 127.0.0.1 "GET /ambassador/v0/diag/" 1ms 500 server error time="2020-05-07 09:36:28" level=error msg="Bad HTTP response" func=github.com/datawire/apro/cmd/amb-sidecar/devportal/server.HTTPGet.func1 file="github.com/datawire/apro@/cmd/amb-sidecar/devportal/server/fetch er.go:165" status_code=500 subsystem=fetcher url="http://127.0.0.1:8877/ambassador/v0/diag/?json=true" time="2020-05-07 09:36:28" level=error msg="HTTP error 500 from http://127.0.0.1:8877/ambassador/v0/diag/?json=true" func=github.com/datawire/apro/cmd/amb-sidecar/devportal/server.HTTPGet file="github.com/dataw ire/apro@/cmd/amb-sidecar/devportal/server/fetcher.go:172" subsystem=fetcher url="http://127.0.0.1:8877/ambassador/v0/diag/?json=true" time="2020-05-07 09:36:28" level=info msg="HTTP error 500 from http://127.0.0.1:8877/ambassador/v0/diag/?json=true" func="github.com/datawire/apro/cmd/amb-sidecar/devportal/server.(*fetcher)._retrieve" file="gi thub.com/datawire/apro@/cmd/amb-sidecar/devportal/server/fetcher.go:195"

To Reproduce Steps to reproduce the behavior:

  1. Just following this documentation: https://www.getambassador.io/docs/latest/topics/install/

Expected behavior A running, stable deployment, and pods which starting ~1 min.

Versions (please complete the following information):

  • Ambassador: [1.3.1, 1.4.1, 1.4.2]
  • Kubernetes environment [AKS]
  • Version [1.13.12, 1.16.7]

Additional context The problem is persist, not helpful the “delete the pod”, persist if I scale up and down the pods. Every new pod restart needs 7-8-10 restart before reach the running state.

Issue Analytics

  • State:closed
  • Created 3 years ago
  • Reactions:1
  • Comments:10

github_iconTop GitHub Comments

1reaction
attilajankocommented, Jun 12, 2020

Seems this configuration for liveness and readiness probe was helpful for us

initialDelaySeconds: 90 periodSeconds: 60 timeoutSeconds: 15 failureThreshold: 10 successThreshold: 1

After these settings the 500 error still exists, but no restart and the pod starting around ~5min

ambassador-7788d44cd7-7ndt2 1/1 Running 0 5m16s ambassador-86dbc79c74-64bmk 1/1 Running 0 4m2s

1reaction
attilajankocommented, Jun 12, 2020

Issue is exist in the version 1.5.2 too.

Read more comments on GitHub >

github_iconTop Results From Across the Web

AES: DLG CTI link is not stable, keeps bouncing, CM shows ...
A restart of the AES by rebooting Linux or restart AES services or simply restart DLG service through the OAM webui should resolve...
Read more >
Systemd Restart=always is not honored - Unix Stack Exchange
I would like to extend Rahul's answer a bit. systemd tries to restart multiple times ( StartLimitBurst ) and stops trying if the...
Read more >
What Is AES Encryption? [The Definitive Q&A Guide]
In this blog post, we're answering your frequently asked questions (FAQs) about AES encryption and AES algorithms, how the encryption process ...
Read more >
BitLocker settings reference - Configuration Manager
If you need to use a removable drive on devices that don't run Windows 10, ... removes its secrets from memory when the...
Read more >
State to investigate AES charges for natural gas plant outage
The plant owned by AES Indiana, formerly Indianapolis Power and ... due to issues when employees tried to restart the plant after it...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found