Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Prometheus is failing to pull metrics from the latest image

See original GitHub issue

Yesterday, one of my Cloudwatch exporter was restarted, and because all the available images are latest, my cluster pulled the image pushed 6 days ago to docker hub. The previous image was running without any issues.

Since the container restarted Prometheus is complaining about the format of the exporter.

Prometheus logs

time="2018-06-25T08:53:16Z" level=error msg="append failed" err="out of bounds" 
source="scrape.go:518" target="{__address__="10.101.90.4:9100", 
__metrics_path__="/metrics", __scheme__="http", endpoint="0", 
instance="10.101.90.4:9100", job="cloudwatch-exporter", 
namespace="monitoring", pod="cloudwatch-exporter-2869121175-zvctm",
 service="cloudwatch-exporter"}"

Cloudwatch metrics endpoint

# HELP cloudwatch_requests_total API requests made to CloudWatch
# TYPE cloudwatch_requests_total counter
cloudwatch_requests_total 1645.0
# HELP aws_rds_free_storage_space_average CloudWatch metric AWS/RDS FreeStorageSpace Dimensions: [DBInstanceIdentifier] Statistic: Average Unit: Bytes
# TYPE aws_rds_free_storage_space_average gauge
aws_rds_free_storage_space_average{job="aws_rds",instance="",dbinstance_identifier="ooo",} 5.7720643584E10 1529914980000
aws_rds_free_storage_space_average{job="aws_rds",instance="",dbinstance_identifier="bbbb",} 2.6073378816E10 1529914980000
# HELP aws_ebs_burst_balance_average CloudWatch metric AWS/EBS BurstBalance Dimensions: [VolumeId] Statistic: Average Unit: Percent
# TYPE aws_ebs_burst_balance_average gauge
aws_ebs_burst_balance_average{job="aws_ebs",instance="",volume_id="vol-2222",} 100.0 1529914800000
# HELP aws_ec2_status_check_failed_average CloudWatch metric AWS/EC2 StatusCheckFailed Dimensions: [InstanceId] Statistic: Average Unit: Count
# TYPE aws_ec2_status_check_failed_average gauge
aws_ec2_status_check_failed_average{job="aws_ec2",instance="",instance_id="i-222",} 0.0 1529914980000
# HELP aws_ec2_status_check_failed_instance_average CloudWatch metric AWS/EC2 StatusCheckFailed_Instance Dimensions: [InstanceId] Statistic: Average Unit: Count
# TYPE aws_ec2_status_check_failed_instance_average gauge
aws_ec2_status_check_failed_instance_average{job="aws_ec2",instance="",instance_id="i-222",} 0.0 1529914980000
# HELP aws_ec2_status_check_failed_system_average CloudWatch metric AWS/EC2 StatusCheckFailed_System Dimensions: [InstanceId] Statistic: Average Unit: Count
# TYPE aws_ec2_status_check_failed_system_average gauge
aws_ec2_status_check_failed_system_average{job="aws_ec2",instance="",instance_id="i-222",} 0.0 1529914980000
# HELP aws_ses_send_sum CloudWatch metric AWS/SES Send Dimensions: null Statistic: Sum Unit: Count
# TYPE aws_ses_send_sum gauge
aws_ses_send_sum{job="aws_ses",instance="",} 1781.0 1529828640000
# HELP aws_ses_delivery_sum CloudWatch metric AWS/SES Delivery Dimensions: null Statistic: Sum Unit: Count
# TYPE aws_ses_delivery_sum gauge
aws_ses_delivery_sum{job="aws_ses",instance="",} 1762.0 1529828640000
# HELP aws_ses_bounce_sum CloudWatch metric AWS/SES Bounce Dimensions: null Statistic: Sum Unit: Count
# TYPE aws_ses_bounce_sum gauge
aws_ses_bounce_sum{job="aws_ses",instance="",} 15.0 1529828640000
# HELP cloudwatch_exporter_scrape_duration_seconds Time this CloudWatch scrape took, in seconds.
# TYPE cloudwatch_exporter_scrape_duration_seconds gauge
cloudwatch_exporter_scrape_duration_seconds 2.595850326
# HELP cloudwatch_exporter_scrape_error Non-zero if this scrape failed.
# TYPE cloudwatch_exporter_scrape_error gauge
cloudwatch_exporter_scrape_error 0.0

Note: I’m using prometheus 2.0.0-alpha.2 Note 2: Can this project tag images before pushing to docker hub?

Issue Analytics

State:
Created 5 years ago
Comments:19 (10 by maintainers)

Top GitHub Comments

2reactions

ajfriesencommented, Jun 28, 2018

I can confirm @gianrubio statement.

Pulled the code from the commit 8aa4ab117a6934f62c013df50deb6a2d248d30c5
Build the container on my system
pushed to my aws registry
Used the image and tag from my registry

Now metrics collection is working again.

I also would suggest creating tags in docker hub like you do in github. Otherwise we can not specify images and will have breaking images in production.

1reaction

gianrubiocommented, Jun 28, 2018

I don’t think so, as I said before this only happen with the latest image. Rolling back to this commit fixed the issue.

Top Results From Across the Web

Prometheus is failing to pull metrics from the latest image

Yesterday, one of my Cloudwatch exporter was restarted, and because all the available images are latest, my cluster pulled the image pushed ...

1639785 – Installer fails to pull prometheus-operator image

1639785 – Installer fails to pull prometheus-operator image. View All Add an attachment (proposed patch, testcase, etc.)

Grafana doesn't pull the previous metrics from Prometheus

I have mounted /data directory in prometheus with a persistent volume so it can save all the data related to the metrics.

Installation - Prometheus.io

The Prometheus image uses a volume to store the actual metrics. ... For this, create a new directory with a Prometheus configuration and...

Kubernetes ImagePullBackOff: Troubleshooting With Examples

The ImagePull part of the ImagePullBackOff error primarily relates to your Kubernetes container runtime being unable to pull the image from a private...