bgpalerter seems to have lost visibility ~ April 8 - RIS issues ?
See original GitHub issueTook me a few days to open report this as I wanted to make sure it is not some local issue:
Since April 8, ~1200 UTC I am not seeing any monitored events being triggered. RIPE Service status indicates all is well, RIS Live should be functioning.
Running on RHEL 8 as a systemd service. About a year in prod. Worked fine after udpating to v1.27.1.
I checked that notifications work with the -t flag. They do. It spams Email and Telegram when I use the -t flag. I checked that the process has sufficient resources and permissions - all good. I checked bgpalerter’s reports.log and it is indeed empty but I know I created plenty of mayhem “events” 😛
I then tried creating a new prefixes.yml list and config.yml by stopping the service, renaming the existing ones, executing the binary manually once with bgpalerter-linux-x64 generate -a ASN-o prefixes.yml -i -m
This completed without errors and created sensible files. I restart service, withdraw a monitored prefix and tail -f reports.log. Nothing.
I hijack my prefix from a lab ASN. Nothing.
Final sanity check before reaching out: I spun up a new Ubuntu server, installed docker and created the bgpalerter docker container. Created config, started it, withdrew a prefix. This instance, too, does not “see” an event.
I do see in error.log of the original prod instance around the thing things stopped working:
2021-04-10T14:34:44+00:00 info: ris connector connected
2021-04-10T14:39:45+00:00 info: ris connector connected
2021-04-10T15:05:15+00:00 error: Error: Unexpected server response: 500
2021-04-10T15:05:15+00:00 error: It was not possible to establish a connection with RIPE RIS
2021-04-10T15:06:20+00:00 error: Error: Unexpected server response: 500
2021-04-10T15:06:20+00:00 error: It was not possible to establish a connection with RIPE RIS
2021-04-10T15:07:25+00:00 info: ris connector connected
But these 500 responses have occured from time to time in the past. Of note is that during my testing of prefix withdrawals the last entry in error.log indicated that we were connected at that time: info: ris connector connected
Did RIPE change anything in RIS Live that could be breaking me ?
Unrelated, I did have some 530 responses from Cloudflare when using cloudflare as my vrpProvider for rpki which broke RPKI detection but I had switched back to ntt since and it worked fine.
Issue Analytics
- State:
- Created 2 years ago
- Comments:6 (6 by maintainers)
As promised, in addition to the fix on the RIS side reported above, in the next release of BGPalerter there will be a check for silent socket sessions.
Yes. I was able to reproduce your issue and I contacted the main dev behind RIS and he did some digging. Somebody was flooding the service with connections (now banned), as a result other new legit connections were slow to be served. You spotted this because you were one of those unlucky, we were already connected and we did not.
We are planning some improvements, including a missing/delayed messages monitoring in both BGPalerter and RIS. You will see a PR linked to this issue soon. In the meanwhile a new rule to limit the number of connections per user has been set in RIS (since one connection can have unlimited subscriptions to prefixes, there is no reason at all to open multiple connections…just a lack of reading-the-doc skills).
Thanks for reporting this!!