question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scraping stops working from time to time

See original GitHub issue

We have deployed the last version of the cloudwatch-exporter. We noticed that it stops getting logs from AWS sometimes and never recovers, having to restart it to fix it. What could be the cause? Maybe the size of the response? I included some logs below:

WARNING: CloudWatch scrape failed
Message: Read timed out). Response Code: 200, Response Text: OK
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1525)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
	at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:965)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:684)
	at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:188)
	at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:329)
	at java.util.Collections.list(Collections.java:3688)
	at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:40)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
	at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
	at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
	at org.eclipse.jetty.server.Server.handle(Server.java:365)
	at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
	at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
	at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
	at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
	at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
	at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
	at java.lang.Thread.run(Thread.java:745)
	at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:599)
	at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:220)
com.amazonaws.SdkClientException: Unable to unmarshall response (ParseError at [row,col]:[1039,14]
	at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:30)
	at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:101)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1501)
	... 41 more
Sep 24, 2017 1:45:24 AM io.prometheus.cloudwatch.CloudWatchCollector collect
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1222)
	at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
	at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:941)
	at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:410)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:143)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:158)
	at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:128)
	at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22)
	at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
	at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:648)
	at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
	at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
	at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
	at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
	at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
	at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
	at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1039,14]
Message: Read timed out
	at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:275)
	at com.amazonaws.services.cloudwatch.model.transform.DimensionStaxUnmarshaller.unmarshall(DimensionStaxUnmarshaller.java:40)
	at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:54)
	at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:43)
	at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)

Thanks!

Issue Analytics

  • State:closed
  • Created 6 years ago
  • Comments:18 (11 by maintainers)

github_iconTop GitHub Comments

1reaction
iyesincommented, Jan 22, 2018

@brian-brazil I’m sorry, I thought we are talking about IT. “Somewhere else” looks like “works on my computer” in this situation. I described you my situation, symptoms, temporary solution. Here is changes I made to exporter (some of them is overkill I know). This fixed cloudwatch_exporter’s non-replying behavior and allowed to trigger error earlier. What evidence what do you need?

1reaction
dene14commented, Jan 17, 2018

Hi @ivanfoo ! Can you share your findings? Facing quite the same issue since recently. Configuration hasn’t been changed on our end.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Python & BS4 - Strange behaviour, scraper freezes/stops ...
Every time I run it, it gets a bit more data, albeit with diminishing returns. I've never experienced anything like it.
Read more >
Time or Data Limitation for Scraping? - Google Groups
I have approximately 85 pages that I am going through and pagination seems to be working fine, but it stops after about 5...
Read more >
10 Tips to avoid getting Blocked while Scraping Websites
Data Scraping is something that has to be done quite responsibly. You have to be very cautious about the website you are scraping....
Read more >
Web scraping challenges and how to deal with them - Techvice
Don't scrape during peak hours​​ It is better to collect data from sites during off-peak periods, so as not to interfere with the...
Read more >
9 Web Scraping Challenges You Should Know - Octoparse
You may see web scraping break down sometimes and meet all kinds of ... Real-time data scraping ... Web scraping may not work...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found