Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Allow custom periods to be used with CPU utilization filters

See original GitHub issue

I want to be able to write a policy for EC2 or RDS instances that filters on max CPU utilization that will return max values for every day over a 2 week period. At the moment I cannot do this since the filter function always looks at the first data point returned by the get_metric_statistics() call. Thus I cannot use the period value in my metric filter since the max value will not always be the first item returned by AWS.

I wrote a new function to test out looping over all the data points to return min, max, or average depending on the filter type.

def metricsValue(self, metrics):
    if len(metrics) == 0:
        return 0
    if self.statistics == 'Maximum':
        return max(metrics, key=lambda x: x[self.statistics])[self.statistics]
    elif self.statistics == 'Minimum':
        return min(metrics, key=lambda x: x[self.statistics])[self.statistics]
    elif self.statistics == 'Average':
        total = reduce(lambda a, b: a[self.statistics] + b[self.statistics], metrics)[self.statistics]
        return total / len(metrics)

    # default case
    return metrics[0][self.statistics]

I call this from process_resource_set(). I replaced the line

elif self.op(collected_metrics[key][0][self.statistics], self.value):

with this

elif self.op(self.metricsValue(collected_metrics[key]), self.value):

With this change I could have a policy like this:

---
policies:
  - name: rds-instances-max-cpu-utilization-between-30-and-40
    resource: aws.rds
    filters:
      - DBInstanceStatus: available
      - type: value
        value_type: age
        key: InstanceCreateTime
        value: 14
        op: gte
      - type: metrics
        name: CPUUtilization
        statistics: Maximum
        value: 40
        op: lte
        days: 14
        missing-value: 0
        period: 86400
      - type: metrics
        name: CPUUtilization
        statistics: Maximum
        value: 30
        op: gt
        days: 14
        missing-value: 0

This will return all data points for all RDS instances where the max value was between 30% and 40% at any point during the last 2 weeks. I will get a data point for each day, even the ones that are outside the range which meets my needs.I can do the same with average and minimum data points.

Issue Analytics

State:
Created 3 years ago
Comments:6 (3 by maintainers)

Top GitHub Comments

1reaction

esaari1commented, Dec 10, 2020

Consider the sample policy I provided. I am asking for daily max values, each day over the last 2 weeks. The aws cli call get_metric_statistics() will give me this, 14 values, but in a random order. So simply taking the first data point in the result set as the original code did and comparing that against my test threshold value will randomly pass or fail, depending on the random order AWS returns the data points. I am now looping over all 14 points and returning the true maximum value. This is the same as if I had asked for the max over the 14 days without a specified period, which results in a single value returned from AWS. So I do that calculation, which would be done on AWS, but now I still have 14 data point and a working max filter.

0reactions

ajkerrigancommented, Dec 12, 2020

If I’m reading the use case here properly we’d need two changes to support it:

Match a resource if any data point matches the given filter
Add an operator like “between” to provide lower and upper bounds in a single value filter

We can’t aggregate data in one shot because that just re-implements server-side logic locally as Kapil noted. But using an “any” match across two filters wouldn’t cover the “between” case in the example. You would end up with false positive matches if you had, say, 20% utilization on Monday (matches for <=40) and 50% utilization on Tuesday (matches for >=30).

Top Results From Across the Web

Creating a CPU usage alarm - Amazon CloudWatch

The alarm changes to the ALARM state when the average CPU use of an EC2 instance exceeds a specified threshold for consecutive specified...

Sample MQL alerting policies - Monitoring - Google Cloud

To let you compare past data to current data, MQL provides the time_shift table operation to move data from the past into the...

How to customize the Linux top command | Enable Sysadmin

Displays all tasks or just active tasks. When this toggle is Off, tasks that have not used any CPU since the last update...

Analyze Data Using SignalFlow - Splunk Dev

Use SignalFlow, the Splunk Observability Cloud statistical computation engine, to analyze incoming data and write custom chart and detector analytics.

Incremental refresh for datasets and real-time data in Power BI

With each subsequent refresh, the query filters return only those rows within the refresh period dynamically defined by the parameters. Those ...