Inconstent alerts, looking for debug methods
See original GitHub issueWe’ve been doing some targetted testing of the last few days for BGPalerter, specifically for route announcements and withdrawals. This goal is to get a feel of how consistently it will let us know of ‘real issues’.
We have reports going to slack, email, syslog.
Our config.py is auto generated from v1.30 a day or two ago, changes from defaults are:
- uncomment
monitorNewPrefix
in monitors: - enable syslog, slack, email addresses/webooks (all ‘channels’ are selected for these)
We announce several dozen prefixes. For testing, we’ve been announcing and withdrawing a /23 and a /24 within a /16. We then changed prefixes.yml to only monitor that single /16 for further testing, and behaviour is the same - right now it’s perhaps one out of every 30/40ish announcements or withdrawals at best. We’ve done this at misc rates - a few changes in the same hour to waiting until overnight to make a change. This seems to have no bearing on when the alert will or wont come in as it generally simply doesn’t.
216.56.0.0/16:
description: testing
asn:
- 2381
ignoreMorespecifics: false
ignore: false
group: noc
The process is taking about 1.5G of memory.
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
bgpaler+ 8965 4.5 2.7 1514832 665636 ? Ssl 09:59 1:35 /opt/bgpalerter/bgpalerter-linux-x64
# pmap 8965 | tail -1
total 1512132K
For debugging purposes, we’re monitoring https://ris-live.ripe.net/ using a modified version of the python script on that page (can provide if interested to anyone, it just monitors all data for source ASN). This always shows the changes, and it’s approximately 70 updates for each time we make a change, which is above the monitorVisibility
threshold: thresholdMinPeers: 40
.
In short, we’re looking to see what if any debug flags we can give BGPalerter to help determine where this, and potentially other issues lie.
Issue Analytics
- State:
- Created a year ago
- Comments:9 (9 by maintainers)
I think I now understand what you are doing and it is not correct.
That is normal. What you see is what the peer is propagating to the RRC (which is the best path to reach your AS). If the peer is not able to see at all the prefix, you will see a W. Otherwise you will see one or more path changes (A).
What you should monitor is for withdrawals, not to monitor for updates and expect them to be withdrawals. BGPalerter is telling you again you have a newprefix, because you removed all his “memory” by setting a low value of
notificationIntervalSeconds
, so each update, including path changes, are now considered new visibility. At some point, if and when the visibility on at least 40 peers is gone, you will get a visibility alert. You can easily see what I mean by loading your prefix and time range in BGPlay, you will see all the path changes and you will see when the visibility goes down. Check the RIS page for more info on the data collected.See my answer above.
BGPalerter keeps an optimized structure in memory, which gets persisted/restored on exit/start in
.cache
.Thanks for details. For whatever reason, I wanst receiving alerts even though nothing was in .cache when i had that timer set longer. I thought I was doing a full stop and start of the process, but perhaps it was just a HUP.
anyhow, setting that notificationIntervalSeconds really is /was my key to testing. Now that I’ve set that lower, put in my ‘real’ config that specifies each advertisement as well as parent prefixes, it seems to be working as intended - visibility alerts now appear for a specific /23 or /24 as long as I have that prefix listed.
If I had only the parent /16 prefix listed, I get announcement alerts any time it became visible or not, better than nothing.
One additional note that I’ve noticed with the RIPE beacons, for whcih we’re monitoring specific /24’s- I only reveive visibility alerts (when they are removed) and i never see announcement alerts. Must be by design.
Thanks for the help