question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Make sure `synapse_rate_limit_reject_affected_hosts` does what it says it does

See original GitHub issue

Spawning from https://github.com/matrix-org/synapse/pull/13541#discussion_r951926322


The synapse_rate_limit_reject_affected_hosts gauge is always evaluating to 0. The raw data in Prometheus also shows 0 for reference.

https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661855196368&to=1661876796368&viewPanel=220

all 0 in the graph

Even though we see individual requests being rejected (synapse_rate_limit_reject_total) which should mean at least 1 host,

with some activity in the graph

But this could be a mismatch in how the guages were being reported because we were accidentally registering them twice, https://github.com/matrix-org/synapse/issues/13641


Now that we fixed the duplicate metric registering issue in https://github.com/matrix-org/synapse/pull/13649 and the fix was put on matrix.org this morning, we’re seeing both at 0 now. This could mean that the previous rejections we were seeing were all from the UsernameAvailabilityRestServlet which we are no longer tracking. And we’re not rejecting any requests in the federation servlets.

It is a bit suspicious though.

and  being 0 in the graph after  deploy

How can we know if it’s right?

In order to confirm that synapse_rate_limit_reject_affected_hosts is working, it would be nice to see a non-zero value.

The reject_limit is 50 which I think means there has to be more than 50 requests within the 1 second federation_rc_window_size to start rejecting.

We do see the rate of slept requests go above 70 sometimes which I would expect to trigger this 🤔

https://grafana.matrix.org/d/dYoRgTgVz/messages-timing?orgId=1&from=1661873701156&to=1661877301156&viewPanel=223

showing spikes above the 50

Dev notes

The synapse_rate_limit_reject_affected_hosts metric was originally added in https://github.com/matrix-org/synapse/pull/13541 and updated in https://github.com/matrix-org/synapse/pull/13649

Issue Analytics

  • State:closed
  • Created a year ago
  • Comments:5 (1 by maintainers)

github_iconTop GitHub Comments

1reaction
MadLittleModscommented, Aug 30, 2022

@DMRobertson Yes but it’s the best we can do to count hosts AFAIK

0reactions
MadLittleModscommented, Dec 12, 2022

Seeing this more active on other hosts (libera.chat):

Going to close as it seems to count something.

Read more comments on GitHub >

github_iconTop Results From Across the Web

No results found

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found