scrapy parse emits ANSI color sequences in the Windows terminal
See original GitHub issueDescription
i try to use scrapy parse
command in cmd(anaconda env),but when it logs Scraped Items and Requests, there are full of garbled code which i show you below(Additional context). I have no idea of it, but it seems to have nothing to do with character encoding,
Steps to Reproduce
running command
>>> scrapy parse --spider=quotes3 http://quotes.toscrape.com/page/1/
the quotes3
spider shows below(Additional context)
Expected behavior: without garbled code
Actual behavior: garbled code appears
Reproduces how often: every time
Versions
Scrapy : 1.8.0 lxml : 4.4.1.0 libxml2 : 2.9.9 cssselect : 1.1.0 parsel : 1.5.2 w3lib : 1.21.0 Twisted : 19.10.0 Python : 3.7.4 (default, Aug 9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)] pyOpenSSL : 19.0.0 (OpenSSL 1.1.1d 10 Sep 2019) cryptography : 2.7 Platform : Windows-10-10.0.18362-SP0
Additional context
spider.py
class QuotesSpider3(scrapy.Spider):
name = 'quotes3'
start_urls = ['http://quotes.toscrape.com/page/1/']
def parse(self, response):
for quote in response.xpath('//div[@class="quote"]'):
yield {
'text':
quote.xpath('span[@class="text"]/text()').get(),
'author':
quote.xpath('.//small[@class="author"]/text()').get(),
'tags':
quote.xpath(
'div[@class="tags"]//a[@class="tag"]/text()').getall()
}
next_page = response.xpath('//li[@class="next"]//a/@href').get()
if next_page is not None:
next_page = response.urljoin(next_page)
yield scrapy.Request(next_page, callback=self.parse)
(base) D:\scrapy_project>chcp
Active code page: 65001
(base) D:\scrapy_project>scrapy parse --spider=quotes3 http://quotes.toscrape.com/page/1/
2020-03-02 18:27:36 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: demo1)
2020-03-02 18:27:36 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.7.4 (default, Aug 9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1d 10 Sep 2019), cryptography 2.7, Platform Windows-10-10.0.18362-SP0
2020-03-02 18:27:36 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'demo1', 'NEWSPIDER_MODULE': 'demo1.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['demo1.spiders']}
2020-03-02 18:27:36 [scrapy.extensions.telnet] INFO: Telnet Password: 67f8ee92329c4e99
2020-03-02 18:27:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
'scrapy.extensions.telnet.TelnetConsole',
'scrapy.extensions.logstats.LogStats']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
'scrapy.downloadermiddlewares.retry.RetryMiddleware',
'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
'scrapy.spidermiddlewares.referer.RefererMiddleware',
'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-03-02 18:27:39 [scrapy.core.engine] INFO: Spider opened
2020-03-02 18:27:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-02 18:27:39 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-03-02 18:27:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2020-03-02 18:27:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2020-03-02 18:27:52 [scrapy.core.engine] INFO: Closing spider (finished)
2020-03-02 18:27:52 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 453,
'downloader/request_count': 2,
'downloader/request_method_count/GET': 2,
'downloader/response_bytes': 2719,
'downloader/response_count': 2,
'downloader/response_status_count/200': 1,
'downloader/response_status_count/404': 1,
'elapsed_time_seconds': 12.789971,
'finish_reason': 'finished',
'finish_time': datetime.datetime(2020, 3, 2, 10, 27, 52, 44220),
'log_count/DEBUG': 2,
'log_count/INFO': 10,
'response_received_count': 2,
'robotstxt/request_count': 1,
'robotstxt/response_count': 1,
'robotstxt/response_status_count/404': 1,
'scheduler/dequeued': 1,
'scheduler/dequeued/memory': 1,
'scheduler/enqueued': 1,
'scheduler/enqueued/memory': 1,
'start_time': datetime.datetime(2020, 3, 2, 10, 27, 39, 254249)}
2020-03-02 18:27:52 [scrapy.core.engine] INFO: Spider closed (finished)
>>> STATUS DEPTH LEVEL 1 <<<
# Scraped Items ------------------------------------------------------------
[{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mAlbert Einstein[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mchange[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mdeep-thoughts[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mthinking[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mworld[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“The world as we have created it is a process of our thinking. It [39;49;00m[33m'[39;49;00m
[33m'[39;49;00m[33mcannot be changed without changing our thinking.”[39;49;00m[33m'[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mJ.K. Rowling[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mabilities[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mchoices[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“It is our choices, Harry, that show what we truly are, far more [39;49;00m[33m'[39;49;00m
[33m'[39;49;00m[33mthan our abilities.”[39;49;00m[33m'[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mAlbert Einstein[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33minspirational[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mlife[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mlive[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mmiracle[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mmiracles[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“There are only two ways to live your life. One is as though [39;49;00m[33m'[39;49;00m
[33m'[39;49;00m[33mnothing is a miracle. The other is as though everything is a [39;49;00m[33m'[39;49;00m
[33m'[39;49;00m[33mmiracle.”[39;49;00m[33m'[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mJane Austen[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33maliteracy[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mbooks[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mclassic[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mhumor[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“The person, be it gentleman or lady, who has not pleasure in a [39;49;00m[33m'[39;49;00m
[33m'[39;49;00m[33mgood novel, must be intolerably stupid.”[39;49;00m[33m'[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mMarilyn Monroe[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mbe-yourself[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minspirational[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m"[39;49;00m[33m“Imperfection is beauty, madness is genius and it[39;49;00m[33m'[39;49;00m[33ms better to be [39;49;00m[33m"[39;49;00m
[33m'[39;49;00m[33mabsolutely ridiculous than absolutely boring.”[39;49;00m[33m'[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mAlbert Einstein[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33madulthood[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33msuccess[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mvalue[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“Try not to become a man of success. Rather become a man of [39;49;00m[33m'[39;49;00m
[33m'[39;49;00m[33mvalue.”[39;49;00m[33m'[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mAndré Gide[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mlife[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mlove[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“It is better to be hated for what you are than to be loved for [39;49;00m[33m'[39;49;00m
[33m'[39;49;00m[33mwhat you are not.”[39;49;00m[33m'[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mThomas A. Edison[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33medison[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mfailure[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minspirational[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mparaphrased[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m"[39;49;00m[33m“I have not failed. I[39;49;00m[33m'[39;49;00m[33mve just found 10,000 ways that won[39;49;00m[33m'[39;49;00m[33mt work.”[39;49;00m[33m"[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mEleanor Roosevelt[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mmisattributed-eleanor-roosevelt[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“A woman is like a tea bag; you never know how strong it is until [39;49;00m[33m'[39;49;00m
[33m"[39;49;00m[33mit[39;49;00m[33m'[39;49;00m[33ms in hot water.”[39;49;00m[33m"[39;49;00m},
{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mSteve Martin[39;49;00m[33m'[39;49;00m,
[33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mhumor[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mobvious[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33msimile[39;49;00m[33m'[39;49;00m],
[33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“A day without sunshine is like, you know, night.”[39;49;00m[33m'[39;49;00m}]
# Requests -----------------------------------------------------------------
[<GET http://quotes.toscrape.com/page/[34m2[39;49;00m/>]
Issue Analytics
- State:
- Created 4 years ago
- Reactions:1
- Comments:8 (7 by maintainers)
Top GitHub Comments
Should we close this issue?
Pass
--nocolour
. While Pygments seems to have some support for the Windows terminal, Scrapy usesTerminalFormatter
which is specifically for ANSI sequences. This is probably a bug in Scrapy.