Scrapy under Python 3 is slower than under Python 2
See original GitHub issuebookworm benchmark from https://github.com/scrapy/scrapy-bench/ (see also https://medium.com/@vermaparth/parth-gsoc-f5556ffa4025) shows about 15% slowdown, while more synthetic scrapy bench
shows a 2x slowdown: https://github.com/scrapy/scrapy/pull/3050#issuecomment-353863711
Issue Analytics
- State:
- Created 6 years ago
- Comments:22 (22 by maintainers)
Top Results From Across the Web
Scrapy Python 3 vs Python 2 - selenium webdriver
I had a scrapy project with Python 2.7 and now I am moving to Python 3.6 but I have encountered a 'problem'. Whenever...
Read more >Installation guide — Scrapy 1.8.3 documentation
Scrapy runs on Python 2.7 and Python 3.5 or above under CPython (default Python implementation) and PyPy (starting with PyPy 5.9).
Read more >Python Web Crawlers : Extensive Overview of Crawling Software
In fact, the two terms have different meanings: web scraping has more to do with retrieving and structuring the webpage's data. On the ......
Read more >Python 3 comes to Scrapy | Hacker News
The breakup between Python 2 and 3 has been very slow and painful. Python devs know that, and that's why they won't break...
Read more >Python 2 vs Python 3: The Key Differences - Great Learning
In almost all tests conducted to check the performance speed of Python 3, it is found that Python 3 is faster than Python...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I think I would start from comparing profiles (that means running under profiler) under Py2.7 and Py3.6 for several benchmarks and trying to spot where most of the difference comes from. For benchmarks, I think it makes sense to check several of them, because some might show more difference and some will be easier to analyze. For profilers, if you already have some preference then go with it. If not, I would suggest using built-in cProfile with some visualization backend (e.g. snakeviz), and vmprof + vmprof.com for visualization - it’s good to have several different profilers because this allows to cross-check profiling results.
I tried running scrapy bench again and again and result is really unpredictable. However, the speed difference between python 2 and python 3 is still really high (about 30%-40%). But then again, the result I copied from the last line of the log is pretty weird. So I made a line of report which divides the pages crawled to the elapsed time, then I ran the benchmarker again and got this result: Python2:
Python3:
As you can see the result is basically the same. But I noticed that the spider speed is much slower over time. I will take the log on python 3 as an example:
This also happens on Python 2 but it’s not that much. I guess that’s where the difference comes from.