question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

Scrapy uses a lot of CPU on our server, 40% is the minimum (when running 1 spider) and most of the time it goes up to 99% (running multiple spiders at the same time). We’re using a DigitalOcean droplet with 1 CPU.

Is this normal? I tried already several things:

  • add sleep(0.1) to Pipelines
  • disabling Pipelines
  • Using Scrapyd with job persistance.
  • set the amount of jobs per cpu with settings like below. scrapy.cfg:
[scrapyd]    
max_proc = 0    
max_proc_per_cpu = 2
Which didn't help either. These are my stats when running a spider:

Nothing helps. These are my stats when I run one scraper:

  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 3221 scrapy    20   0  836724  75568   8432 R 40.5  7.4   1:04.31 python
 3231 scrapy    20   0  836724  75568   8432 S  1.0  7.4   0:01.40 python
 3234 scrapy    20   0  836724  75568   8432 S  1.0  7.4   0:01.42 python
 3235 scrapy    20   0  836724  75568   8432 S  1.0  7.4   0:01.44 python
 3228 scrapy    20   0  836724  75568   8432 S  0.7  7.4   0:01.43 python

I found out that if I set the variable CONCURRENT_REQUESTS = 1 the CPU stays below the 40% when running 1 spider, but it takes 3 times as long to finish.

I also tried to turn off every extension, middleware and pipeline, but that didn’t reduce the CPU load either. Any ideas?

Issue Analytics

  • State:closed
  • Created 8 years ago
  • Comments:13 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
ErikvdVencommented, May 28, 2015

Thanks a lot!!! Guess it’s working:

I added these settings:

AUTOTHROTTLE_ENABLED = True
AUTOTHROTTLE_START_DELAY = 0.1
AUTOTHROTTLE_MAX_DELAY = 0.5

Result: cpu

0reactions
nramirezuycommented, May 28, 2015

@ErikvdVen yw, if you have anymore doubts you can reach us on scrapy-users or #scrapy IRC.

Read more comments on GitHub >

github_iconTop Results From Across the Web

What is CPU usage, and how to fix high CPU usage
High CPU usage is often the cause of bad performance. Your computer may be affected by this problem if you're experiencing long loading ......
Read more >
How to Fix High CPU Usage (with Pictures) - wikiHow
1. Press .Ctrl+ Shift+Esc to open the Task Manager. This is a utility that monitors and reports on all of the processes and...
Read more >
How to Fix High CPU Usage in Windows - AVG
1. Identify the process that's causing 100% CPU usage · 2. Close unnecessary applications or put them sleep · 3. Give your PC...
Read more >
How to Lower CPU Usage - NinjaOne
In this article, we'll look at how to remedy high CPU usage on your machine or machines. What this article will cover: Why...
Read more >
High CPU usage | DataGrip Documentation - JetBrains
High CPU usage. Last modified: 05 December 2022. Required plugin: Performance Testing (not bundled) ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found