question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Inconstent alerts, looking for debug methods

See original GitHub issue

We’ve been doing some targetted testing of the last few days for BGPalerter, specifically for route announcements and withdrawals. This goal is to get a feel of how consistently it will let us know of ‘real issues’.

We have reports going to slack, email, syslog.

Our config.py is auto generated from v1.30 a day or two ago, changes from defaults are:

  • uncomment monitorNewPrefix in monitors:
  • enable syslog, slack, email addresses/webooks (all ‘channels’ are selected for these)

We announce several dozen prefixes. For testing, we’ve been announcing and withdrawing a /23 and a /24 within a /16. We then changed prefixes.yml to only monitor that single /16 for further testing, and behaviour is the same - right now it’s perhaps one out of every 30/40ish announcements or withdrawals at best. We’ve done this at misc rates - a few changes in the same hour to waiting until overnight to make a change. This seems to have no bearing on when the alert will or wont come in as it generally simply doesn’t.

216.56.0.0/16:
  description: testing
  asn:
    - 2381
  ignoreMorespecifics: false
  ignore: false
  group: noc

The process is taking about 1.5G of memory.

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
bgpaler+  8965  4.5  2.7 1514832 665636 ?      Ssl  09:59   1:35 /opt/bgpalerter/bgpalerter-linux-x64

# pmap 8965 | tail -1
 total          1512132K

For debugging purposes, we’re monitoring https://ris-live.ripe.net/ using a modified version of the python script on that page (can provide if interested to anyone, it just monitors all data for source ASN). This always shows the changes, and it’s approximately 70 updates for each time we make a change, which is above the monitorVisibility threshold: thresholdMinPeers: 40 .

In short, we’re looking to see what if any debug flags we can give BGPalerter to help determine where this, and potentially other issues lie.

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:9 (9 by maintainers)

github_iconTop GitHub Comments

1reaction
massimocandelacommented, Oct 26, 2022

I think I now understand what you are doing and it is not correct.

I do indeed see about ~2k type": "W" on that link, but 12k type": "A".

That is normal. What you see is what the peer is propagating to the RRC (which is the best path to reach your AS). If the peer is not able to see at all the prefix, you will see a W. Otherwise you will see one or more path changes (A).

All that I know is if it’s supposed to be listed in "withdrawals": [], it’s not, and the alarm from bgpalerter said it was an advertisement, not a withdrawal.

What you should monitor is for withdrawals, not to monitor for updates and expect them to be withdrawals. BGPalerter is telling you again you have a newprefix, because you removed all his “memory” by setting a low value of notificationIntervalSeconds, so each update, including path changes, are now considered new visibility. At some point, if and when the visibility on at least 40 peers is gone, you will get a visibility alert. You can easily see what I mean by loading your prefix and time range in BGPlay, you will see all the path changes and you will see when the visibility goes down. Check the RIS page for more info on the data collected.

I tested again while typing this, but with 216.56.252.0/23, and got the same ‘newprefix’ alert, when the change we made was most definitely removing it.

See my answer above.

Follow up question on your first note - notificationIntervalSeconds - where is the ‘memory’ of alerts held? I see there is a .cache directory but prior to me setting that lower, it had nothing in it. I would stop and start bgpalerter but it would still not alarm. Is there a cache of some type stored outside of that area?

BGPalerter keeps an optimized structure in memory, which gets persisted/restored on exit/start in .cache.

0reactions
falzcommented, Oct 27, 2022

Thanks for details. For whatever reason, I wanst receiving alerts even though nothing was in .cache when i had that timer set longer. I thought I was doing a full stop and start of the process, but perhaps it was just a HUP.

anyhow, setting that notificationIntervalSeconds really is /was my key to testing. Now that I’ve set that lower, put in my ‘real’ config that specifies each advertisement as well as parent prefixes, it seems to be working as intended - visibility alerts now appear for a specific /23 or /24 as long as I have that prefix listed.

If I had only the parent /16 prefix listed, I get announcement alerts any time it became visible or not, better than nothing.

One additional note that I’ve noticed with the RIPE beacons, for whcih we’re monitoring specific /24’s- I only reveive visibility alerts (when they are removed) and i never see announcement alerts. Must be by design.

Thanks for the help

Read more comments on GitHub >

github_iconTop Results From Across the Web

7 Debugging Techniques to Speed Up Troubleshooting | Toptal
In this article, Toptal Freelance Java Developer Flavio Pezzini shows us techniques that can prevent bugs and make finding issues in production much...
Read more >
Inconsistent Debug Value - Salesforce Stack Exchange
1 Answer 1 ... If you're trying to see actual debug statements, it helps to set all levels to NONE except for System,...
Read more >
How to get rid of inconsistent behavior between running with ...
Hit F5 (debug) and it doesn't properly catch the DerivedException . Hit Ctrl+F5 (no debugging) and it works fine, catching the DerivedException ...
Read more >
Troubleshooting Alerts, Debug Events, and Errors on Twilio ...
This guide is intended to help troubleshoot any alerts, debug events, or errors on Twilio Programmable Voice calls.
Read more >
AMDP behavior inconsistent in Debug mode and live run
We are facing one issue in AMDP when we put breakpoint in AMDP and execute we get correct result. Without debugger we get...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found