question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

CPU usage creeps with latest Raven client on Tornado

See original GitHub issue

After upgrading one of our (SeatGeek) production services to use the latest version of Raven (raven==5.24.3 at the time) we experience consistent CPU usage spike to 100% and eventually crash the API workers.

image

After some profiling it New Relic it appears that it’s somehow related to the new(ish) breadcrumbs feature.

screen shot 2016-08-31 at 11 57 44 am

Let me know if I can do anything else to help debug this.

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:16 (4 by maintainers)

github_iconTop GitHub Comments

1reaction
BarryJRowecommented, May 11, 2018

We’ve been having what seems to be this issue on our servers for the past few months causing us a lot of server instability and headaches trying figure out what was going on. For us it seems to be an issue when certain timeouts pile up, which causes raven to start consuming a lot of cpu and memory, causing more timeouts and eventually causing our server to hang. I’ve created some example code to reproduce the problem:

import sys
import tornado.httpclient
import tornado.httpserver
import tornado.web
import tornado.gen
from raven.contrib.tornado import AsyncSentryClient, SentryMixin

#modify the time.time function so we can virtually fast-forward time.
old_time = time.time
inc_time = 0
def new_time(*args, **kwargs):
    return old_time()+inc_time
time.time = new_time

http_client = tornado.httpclient.AsyncHTTPClient()

class MainHandler(SentryMixin, tornado.web.RequestHandler):
    @tornado.gen.coroutine
    def get(self):
        global inc_time
        #simulate a long running request here by adding 30 seconds to time.time
        inc_time +=30
        #time.sleep(5)
        raise

application = tornado.web.Application([
    (r"/", MainHandler),
])
application.sentry_client = AsyncSentryClient(
    sentry_url_here,
#    enable_breadcrumbs=False,
)

def main():
    if sys.argv[1] == "0":
        application.listen(6000)
        tornado.ioloop.IOLoop.instance().start()
    elif sys.argv[1] == "1":
        count = 0
        while True:
            count+=1
            try:
                tornado.httpclient.HTTPClient().fetch("http://localhost:6000/")
            except KeyboardInterrupt:
                raise
            except:
                pass
            print count

if __name__=='__main__':
    main()

Run python script.py 0 in one terminal and python script 1 in another to reproduce the results. When count reaches about 30 in the while loop, the server really starts to slow down to a crawl. This cpu usage is being seen in the Client.encode function of base.py, as the size of the json it’s trying to output reaches 100’s of megabytes. Turning off breadcrumbs seems to fix the issue.

In the script I’ve modified the time.time function to simulate a long running, blocking operation. This is not entirely fair since it doesn’t give any extra time to do garbage collection, and using time.sleep(30) instead seems to cause the problem to either not happen, or at least happen very slowly. I’ve found that in this example, using a time.sleep(5) instead will reproduce the problem without having to modify time.time at all (in this case by count = 20 the size of the json reaches 100k, and by count = 60 it’s about 70mb, at count = 86 it’s 1.4gb and the single tornado application is taking up 21gb of memory).

Strangely enough, when the time.sleep(5) and the modified time.time are removed, there is no issue and the json size doesn’t seem to go above 35k.

The version of raven used for the above was 6.7.0, but the problem still seems to exists for older versions, even back to 5.15.0, though different versions do behave slightly differently.

0reactions
fabiopedrosacommented, May 28, 2018

I’m having exactly the same issue as @BarryJRowe, we had to stop using sentry for our services. I’ve looking into it, and AsyncSentryClient was trying to send multiple requests with 200Mb of data, so processes were growing to several gigabytes of memory usage just for sentry processing!

Read more comments on GitHub >

github_iconTop Results From Across the Web

Tornado Server using most of the cpu while ... - Stack Overflow
I am using Tornado Server, 4.4.2 and pypy 5.9.0 and python 2.7.13, hosted on Ubuntu 16.04.3 LTS. A new client logs in and...
Read more >
Untitled
Ristorante bottega di montecitorio, New super fast cars, 4 tygodnie ciazy ... Nt kernel system high cpu usage vista, Bethuelsen blogg, Pennsville post ......
Read more >
Predict_author-checkpoint - Jupyter Notebooks Gallery
A gallery of the most interesting jupyter notebooks online.
Read more >
Tornado websocket throughput capacity - Google Groups
The client simulates the approximate size and timing of messages that we will ... CPU usage grows proportionately with data throughput so I...
Read more >
Awesome Games Done Quick 2023 Online All Submissions
The speedrun is brand new, and requires careful routing, along with quick ... Use of special moves, such as the Tornado Spin, can...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found