question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow custom periods to be used with CPU utilization filters

See original GitHub issue

I want to be able to write a policy for EC2 or RDS instances that filters on max CPU utilization that will return max values for every day over a 2 week period. At the moment I cannot do this since the filter function always looks at the first data point returned by the get_metric_statistics() call. Thus I cannot use the period value in my metric filter since the max value will not always be the first item returned by AWS.

I wrote a new function to test out looping over all the data points to return min, max, or average depending on the filter type.

def metricsValue(self, metrics):
    if len(metrics) == 0:
        return 0
    if self.statistics == 'Maximum':
        return max(metrics, key=lambda x: x[self.statistics])[self.statistics]
    elif self.statistics == 'Minimum':
        return min(metrics, key=lambda x: x[self.statistics])[self.statistics]
    elif self.statistics == 'Average':
        total = reduce(lambda a, b: a[self.statistics] + b[self.statistics], metrics)[self.statistics]
        return total / len(metrics)

    # default case
    return metrics[0][self.statistics]

I call this from process_resource_set(). I replaced the line

elif self.op(collected_metrics[key][0][self.statistics], self.value):

with this

elif self.op(self.metricsValue(collected_metrics[key]), self.value):

With this change I could have a policy like this:

---
policies:
  - name: rds-instances-max-cpu-utilization-between-30-and-40
    resource: aws.rds
    filters:
      - DBInstanceStatus: available
      - type: value
        value_type: age
        key: InstanceCreateTime
        value: 14
        op: gte
      - type: metrics
        name: CPUUtilization
        statistics: Maximum
        value: 40
        op: lte
        days: 14
        missing-value: 0
        period: 86400
      - type: metrics
        name: CPUUtilization
        statistics: Maximum
        value: 30
        op: gt
        days: 14
        missing-value: 0

This will return all data points for all RDS instances where the max value was between 30% and 40% at any point during the last 2 weeks. I will get a data point for each day, even the ones that are outside the range which meets my needs.I can do the same with average and minimum data points.

Issue Analytics

  • State:open
  • Created 3 years ago
  • Comments:6 (3 by maintainers)

github_iconTop GitHub Comments

1reaction
esaari1commented, Dec 10, 2020

Consider the sample policy I provided. I am asking for daily max values, each day over the last 2 weeks. The aws cli call get_metric_statistics() will give me this, 14 values, but in a random order. So simply taking the first data point in the result set as the original code did and comparing that against my test threshold value will randomly pass or fail, depending on the random order AWS returns the data points. I am now looping over all 14 points and returning the true maximum value. This is the same as if I had asked for the max over the 14 days without a specified period, which results in a single value returned from AWS. So I do that calculation, which would be done on AWS, but now I still have 14 data point and a working max filter.

0reactions
ajkerrigancommented, Dec 12, 2020

If I’m reading the use case here properly we’d need two changes to support it:

  • Match a resource if any data point matches the given filter
  • Add an operator like “between” to provide lower and upper bounds in a single value filter

We can’t aggregate data in one shot because that just re-implements server-side logic locally as Kapil noted. But using an “any” match across two filters wouldn’t cover the “between” case in the example. You would end up with false positive matches if you had, say, 20% utilization on Monday (matches for <=40) and 50% utilization on Tuesday (matches for >=30).

Read more comments on GitHub >

github_iconTop Results From Across the Web

Creating a CPU usage alarm - Amazon CloudWatch
The alarm changes to the ALARM state when the average CPU use of an EC2 instance exceeds a specified threshold for consecutive specified...
Read more >
Sample MQL alerting policies - Monitoring - Google Cloud
To let you compare past data to current data, MQL provides the time_shift table operation to move data from the past into the...
Read more >
How to customize the Linux top command | Enable Sysadmin
Displays all tasks or just active tasks. When this toggle is Off, tasks that have not used any CPU since the last update...
Read more >
Analyze Data Using SignalFlow - Splunk Dev
Use SignalFlow, the Splunk Observability Cloud statistical computation engine, to analyze incoming data and write custom chart and detector analytics.
Read more >
Incremental refresh for datasets and real-time data in Power BI
With each subsequent refresh, the query filters return only those rows within the refresh period dynamically defined by the parameters. Those ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found