Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scraping stops working from time to time

See original GitHub issue

We have deployed the last version of the cloudwatch-exporter. We noticed that it stops getting logs from AWS sometimes and never recovers, having to restart it to fix it. What could be the cause? Maybe the size of the response? I included some logs below:

WARNING: CloudWatch scrape failed
Message: Read timed out). Response Code: 200, Response Text: OK
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1525)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:965)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:684)
	at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:188)
	at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:329)
	at java.util.Collections.list(Collections.java:3688)
	at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:40)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:365)
	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
	at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)
	at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:599)
	at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:220)
com.amazonaws.SdkClientException: Unable to unmarshall response (ParseError at [row,col]:[1039,14]
	at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:30)
	at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:101)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1501)
	... 41 more
Sep 24, 2017 1:45:24 AM io.prometheus.cloudwatch.CloudWatchCollector collect
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1222)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:941)
	at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:410)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:143)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:158)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:128)
	at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:648)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1039,14]
Message: Read timed out
	at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:275)
	at com.amazonaws.services.cloudwatch.model.transform.DimensionStaxUnmarshaller.unmarshall(DimensionStaxUnmarshaller.java:40)
	at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:54)
	at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:43)
	at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)

Thanks!

Issue Analytics

State:
Created 6 years ago
Comments:18 (11 by maintainers)

Top GitHub Comments

1reaction

iyesincommented, Jan 22, 2018

@brian-brazil I’m sorry, I thought we are talking about IT. “Somewhere else” looks like “works on my computer” in this situation. I described you my situation, symptoms, temporary solution. Here is changes I made to exporter (some of them is overkill I know). This fixed cloudwatch_exporter’s non-replying behavior and allowed to trigger error earlier. What evidence what do you need?

1reaction

dene14commented, Jan 17, 2018

Hi @ivanfoo ! Can you share your findings? Facing quite the same issue since recently. Configuration hasn’t been changed on our end.