Scraping stops working from time to time
See original GitHub issueWe have deployed the last version of the cloudwatch-exporter. We noticed that it stops getting logs from AWS sometimes and never recovers, having to restart it to fix it. What could be the cause? Maybe the size of the response? I included some logs below:
WARNING: CloudWatch scrape failed
Message: Read timed out). Response Code: 200, Response Text: OK
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1525)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeHelper(AmazonHttpClient.java:1035)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeWithTimer(AmazonHttpClient.java:721)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.execute(AmazonHttpClient.java:704)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.access$500(AmazonHttpClient.java:672)
at com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl.execute(AmazonHttpClient.java:654)
at com.amazonaws.http.AmazonHttpClient.execute(AmazonHttpClient.java:518)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.doInvoke(AmazonCloudWatchClient.java:965)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.listMetrics(AmazonCloudWatchClient.java:684)
at io.prometheus.cloudwatch.CloudWatchCollector.getDimensions(CloudWatchCollector.java:188)
at io.prometheus.cloudwatch.CloudWatchCollector.scrape(CloudWatchCollector.java:329)
at java.util.Collections.list(Collections.java:3688)
at io.prometheus.client.exporter.MetricsServlet.doGet(MetricsServlet.java:40)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:735)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:116)
at org.eclipse.jetty.server.Server.handle(Server.java:365)
at org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:485)
at org.eclipse.jetty.server.AbstractHttpConnection.headerComplete(AbstractHttpConnection.java:926)
at org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.headerComplete(AbstractHttpConnection.java:988)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:635)
at org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:51)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:543)
at java.lang.Thread.run(Thread.java:745)
at com.sun.org.apache.xerces.internal.impl.XMLStreamReaderImpl.next(XMLStreamReaderImpl.java:599)
at com.amazonaws.transform.StaxUnmarshallerContext.nextEvent(StaxUnmarshallerContext.java:220)
com.amazonaws.SdkClientException: Unable to unmarshall response (ParseError at [row,col]:[1039,14]
at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:30)
at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:101)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.handleResponse(AmazonHttpClient.java:1501)
... 41 more
Sep 24, 2017 1:45:24 AM io.prometheus.cloudwatch.CloudWatchCollector collect
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.executeOneRequest(AmazonHttpClient.java:1222)
at com.amazonaws.http.AmazonHttpClient$RequestExecutor.doExecute(AmazonHttpClient.java:747)
at com.amazonaws.services.cloudwatch.AmazonCloudWatchClient.invoke(AmazonCloudWatchClient.java:941)
at io.prometheus.cloudwatch.CloudWatchCollector.collect(CloudWatchCollector.java:410)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.findNextElement(CollectorRegistry.java:143)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:158)
at io.prometheus.client.CollectorRegistry$MetricFamilySamplesEnumeration.nextElement(CollectorRegistry.java:128)
at io.prometheus.client.exporter.common.TextFormat.write004(TextFormat.java:22)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:848)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:648)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1072)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:382)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1006)
at org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:82)
at org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:627)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:608)
Caused by: javax.xml.stream.XMLStreamException: ParseError at [row,col]:[1039,14]
Message: Read timed out
at com.sun.xml.internal.stream.XMLEventReaderImpl.peek(XMLEventReaderImpl.java:275)
at com.amazonaws.services.cloudwatch.model.transform.DimensionStaxUnmarshaller.unmarshall(DimensionStaxUnmarshaller.java:40)
at com.amazonaws.services.cloudwatch.model.transform.ListMetricsResultStaxUnmarshaller.unmarshall(ListMetricsResultStaxUnmarshaller.java:54)
at com.amazonaws.http.StaxResponseHandler.handle(StaxResponseHandler.java:43)
at com.amazonaws.http.response.AwsResponseHandlerAdapter.handle(AwsResponseHandlerAdapter.java:70)
Thanks!
Issue Analytics
- State:
- Created 6 years ago
- Comments:18 (11 by maintainers)
Top Results From Across the Web
Python & BS4 - Strange behaviour, scraper freezes/stops ...
Every time I run it, it gets a bit more data, albeit with diminishing returns. I've never experienced anything like it.
Read more >Time or Data Limitation for Scraping? - Google Groups
I have approximately 85 pages that I am going through and pagination seems to be working fine, but it stops after about 5...
Read more >10 Tips to avoid getting Blocked while Scraping Websites
Data Scraping is something that has to be done quite responsibly. You have to be very cautious about the website you are scraping....
Read more >Web scraping challenges and how to deal with them - Techvice
Don't scrape during peak hours It is better to collect data from sites during off-peak periods, so as not to interfere with the...
Read more >9 Web Scraping Challenges You Should Know - Octoparse
You may see web scraping break down sometimes and meet all kinds of ... Real-time data scraping ... Web scraping may not work...
Read more >
Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free
Top Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
@brian-brazil I’m sorry, I thought we are talking about IT. “Somewhere else” looks like “works on my computer” in this situation. I described you my situation, symptoms, temporary solution. Here is changes I made to exporter (some of them is overkill I know). This fixed
cloudwatch_exporter
’s non-replying behavior and allowed to trigger error earlier. What evidence what do you need?Hi @ivanfoo ! Can you share your findings? Facing quite the same issue since recently. Configuration hasn’t been changed on our end.