Synapse keeps being killed due to healthcheck failures
See original GitHub issueDescription
I am running a docker stack with the official Synapse image in it.
Absolutely randomly, Synaps just shuts down. See log; it literally just receives a SIGTERM and decides to shut itself down. Everything up to where the copied log starts is just normal Synapse logging
It’s driving me absolutely nuts, why would it possibly do this? Is this a new ‘feature’ in 1.65?
Steps to reproduce
- Start Synapse service
- Wait a few minutes, hours, or days
Homeserver
My own homeserver
Synapse Version
v1.65.0
Installation Method
Docker (matrixdotorg/synapse)
Platform
Docker on debian
Relevant log output
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:30,627 - synapse.storage.databases.main.event_push_actions - 969 - INFO - rotate_notifs-62 - Rotating notifications
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:30,628 - synapse.storage.databases.main.event_push_actions - 1130 - INFO - rotate_notifs-62 - Rotating notifications up to: 1326650
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:30,631 - synapse.storage.databases.main.event_push_actions - 1218 - INFO - rotate_notifs-62 - Rotating notifications, handling 0 rows
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:30,661 - synapse.storage.databases.main.event_push_actions - 1293 - INFO - rotate_notifs-62 - Rotating notifications, deleted 0 push actions
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:30,951 - synapse.util.caches.lrucache - 212 - INFO - LruCache._expire_old_entries-62 - Dropped 0 items from caches
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,819 - twisted - 274 - INFO - sentinel - Received SIGTERM, shutting down.
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,820 - synapse.storage.databases.main.lock - 92 - INFO - LockStore._on_shutdown-0 - Dropping held locks due to shutdown
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,821 - synapse.storage.databases.main.lock - 101 - INFO - LockStore._on_shutdown-0 - Dropped locks due to shutdown
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,821 - synapse.handlers.presence - 766 - INFO - presence.on_shutdown-0 - Performing _on_shutdown. Persisting 7 unpersisted changes
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,822 - synapse.app._base - 492 - INFO - sentinel - Shutting down...
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,912 - synapse.handlers.presence - 779 - INFO - presence.on_shutdown-0 - Finished _on_shutdown
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,913 - synapse.http.site - 362 - INFO - GET-553 - Connection from client lost before response was sent
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,914 - synapse.http.site - 362 - INFO - GET-556 - Connection from client lost before response was sent
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,914 - twisted - 274 - INFO - sentinel - (TCP Port 8008 Closed)
matrix_synapse.1.mybcsb5py8xx@beast | 2022-08-25 05:31:39,919 - twisted - 274 - INFO - sentinel - Main loop terminated.
Anything else that would be useful to know?
No response
Issue Analytics
- State:
- Created a year ago
- Comments:17 (7 by maintainers)
Top Results From Across the Web
Availability group lease health check timeout - Microsoft Learn
Mechanics and guidelines for the lease, cluster, and health check times ... service and is killed whenever the cluster service is killed.
Read more >Synapse stops responding to incoming requests if ... - GitHub
It seems that Synapse is losing its connection to Postgres. It is known that Synapse does not handle the db connection disappearing and ......
Read more >Synapse Devops Guide - The Vertex Project
If you are promoting the follower due to a catastrophic failure of the previous leader, you may use the command synapse.tools.promote --failure to...
Read more >Marathon: Health Checks and Task Termination - Mesosphere
This means that setting maxConsecutiveFailures = 0 will lead to task being killed immediately after first health check fails. timeoutSeconds (Optional.
Read more >Immune Synapse: Beautiful Under The Microscope, Kiss Of ...
The immunological synapse is a thing of beauty to behold under a microscope but the kiss of death for a cancer cell. "Synapse"...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
I sadly don’t seem to be getting any events for the task that was shut down, only the current one (cwo8bra2rt2u)
These are the current and shut down tasks, I’ve tried various timestamps and checked for the four IDs that were shut down, but no dice. I’ll start up a manual log of the events to a file, then check that file when it next crashes.
I think so, all tasks in my Matrix stack have been running since I rebooted the docker daemon for something 5 days ago.
So I guess this issue can be closed, with increasing the health check timeout as the solution. Thanks @DMRobertson and @richvdh for helping me figure this out!