Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scrapy parse emits ANSI color sequences in the Windows terminal

See original GitHub issue

Description

i try to use scrapy parse command in cmd（anaconda env），but when it logs Scraped Items and Requests, there are full of garbled code which i show you below（Additional context）. I have no idea of it, but it seems to have nothing to do with character encoding,

Steps to Reproduce

running command

>>> scrapy parse --spider=quotes3 http://quotes.toscrape.com/page/1/

the quotes3 spider shows below（Additional context）

Expected behavior: without garbled code

Actual behavior: garbled code appears

Reproduces how often: every time

Versions

Scrapy : 1.8.0 lxml : 4.4.1.0 libxml2 : 2.9.9 cssselect : 1.1.0 parsel : 1.5.2 w3lib : 1.21.0 Twisted : 19.10.0 Python : 3.7.4 (default, Aug 9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)] pyOpenSSL : 19.0.0 (OpenSSL 1.1.1d 10 Sep 2019) cryptography : 2.7 Platform : Windows-10-10.0.18362-SP0

Additional context

spider.py

class QuotesSpider3(scrapy.Spider):
    name = 'quotes3'
    start_urls = ['http://quotes.toscrape.com/page/1/']

    def parse(self, response):
        for quote in response.xpath('//div[@class="quote"]'):
            yield {
                'text':
                quote.xpath('span[@class="text"]/text()').get(),
                'author':
                quote.xpath('.//small[@class="author"]/text()').get(),
                'tags':
                quote.xpath(
                    'div[@class="tags"]//a[@class="tag"]/text()').getall()
            }

        next_page = response.xpath('//li[@class="next"]//a/@href').get()
        if next_page is not None:
            next_page = response.urljoin(next_page)
            yield scrapy.Request(next_page, callback=self.parse)

(base) D:\scrapy_project>chcp
Active code page: 65001

(base) D:\scrapy_project>scrapy parse --spider=quotes3 http://quotes.toscrape.com/page/1/
2020-03-02 18:27:36 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: demo1)
2020-03-02 18:27:36 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.7.4 (default, Aug  9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1d  10 Sep 2019), cryptography 2.7, Platform Windows-10-10.0.18362-SP0
2020-03-02 18:27:36 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'demo1', 'NEWSPIDER_MODULE': 'demo1.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['demo1.spiders']}
2020-03-02 18:27:36 [scrapy.extensions.telnet] INFO: Telnet Password: 67f8ee92329c4e99
2020-03-02 18:27:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-03-02 18:27:39 [scrapy.core.engine] INFO: Spider opened
2020-03-02 18:27:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-02 18:27:39 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-03-02 18:27:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2020-03-02 18:27:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2020-03-02 18:27:52 [scrapy.core.engine] INFO: Closing spider (finished)
2020-03-02 18:27:52 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 453,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 2719,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/404': 1,
 'elapsed_time_seconds': 12.789971,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2020, 3, 2, 10, 27, 52, 44220),
 'log_count/DEBUG': 2,
 'log_count/INFO': 10,
 'response_received_count': 2,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/404': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2020, 3, 2, 10, 27, 39, 254249)}
2020-03-02 18:27:52 [scrapy.core.engine] INFO: Spider closed (finished)

>>> STATUS DEPTH LEVEL 1 <<<
# Scraped Items  ------------------------------------------------------------
[{[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mAlbert Einstein[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mchange[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mdeep-thoughts[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mthinking[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mworld[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“The world as we have created it is a process of our thinking. It [39;49;00m[33m'[39;49;00m
          [33m'[39;49;00m[33mcannot be changed without changing our thinking.”[39;49;00m[33m'[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mJ.K. Rowling[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mabilities[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mchoices[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“It is our choices, Harry, that show what we truly are, far more [39;49;00m[33m'[39;49;00m
          [33m'[39;49;00m[33mthan our abilities.”[39;49;00m[33m'[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mAlbert Einstein[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33minspirational[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mlife[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mlive[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mmiracle[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mmiracles[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“There are only two ways to live your life. One is as though [39;49;00m[33m'[39;49;00m
          [33m'[39;49;00m[33mnothing is a miracle. The other is as though everything is a [39;49;00m[33m'[39;49;00m
          [33m'[39;49;00m[33mmiracle.”[39;49;00m[33m'[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mJane Austen[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33maliteracy[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mbooks[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mclassic[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mhumor[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“The person, be it gentleman or lady, who has not pleasure in a [39;49;00m[33m'[39;49;00m
          [33m'[39;49;00m[33mgood novel, must be intolerably stupid.”[39;49;00m[33m'[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mMarilyn Monroe[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mbe-yourself[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minspirational[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m"[39;49;00m[33m“Imperfection is beauty, madness is genius and it[39;49;00m[33m'[39;49;00m[33ms better to be [39;49;00m[33m"[39;49;00m
          [33m'[39;49;00m[33mabsolutely ridiculous than absolutely boring.”[39;49;00m[33m'[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mAlbert Einstein[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33madulthood[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33msuccess[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mvalue[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“Try not to become a man of success. Rather become a man of [39;49;00m[33m'[39;49;00m
          [33m'[39;49;00m[33mvalue.”[39;49;00m[33m'[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mAndré Gide[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mlife[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mlove[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“It is better to be hated for what you are than to be loved for [39;49;00m[33m'[39;49;00m
          [33m'[39;49;00m[33mwhat you are not.”[39;49;00m[33m'[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mThomas A. Edison[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33medison[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mfailure[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33minspirational[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mparaphrased[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m"[39;49;00m[33m“I have not failed. I[39;49;00m[33m'[39;49;00m[33mve just found 10,000 ways that won[39;49;00m[33m'[39;49;00m[33mt work.”[39;49;00m[33m"[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mEleanor Roosevelt[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mmisattributed-eleanor-roosevelt[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“A woman is like a tea bag; you never know how strong it is until [39;49;00m[33m'[39;49;00m
          [33m"[39;49;00m[33mit[39;49;00m[33m'[39;49;00m[33ms in hot water.”[39;49;00m[33m"[39;49;00m},
 {[33m'[39;49;00m[33mauthor[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33mSteve Martin[39;49;00m[33m'[39;49;00m,
  [33m'[39;49;00m[33mtags[39;49;00m[33m'[39;49;00m: [[33m'[39;49;00m[33mhumor[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33mobvious[39;49;00m[33m'[39;49;00m, [33m'[39;49;00m[33msimile[39;49;00m[33m'[39;49;00m],
  [33m'[39;49;00m[33mtext[39;49;00m[33m'[39;49;00m: [33m'[39;49;00m[33m“A day without sunshine is like, you know, night.”[39;49;00m[33m'[39;49;00m}]

# Requests  -----------------------------------------------------------------
[<GET http://quotes.toscrape.com/page/[34m2[39;49;00m/>]

Issue Analytics

State:
Created 4 years ago
Reactions:1
Comments:8 (7 by maintainers)

Top GitHub Comments

1reaction

akshaysharmajscommented, Jul 27, 2020

Should we close this issue?

1reaction

wRARcommented, Mar 2, 2020

Pass --nocolour. While Pygments seems to have some support for the Windows terminal, Scrapy uses TerminalFormatter which is specifically for ANSI sequences. This is probably a bug in Scrapy.

Top Results From Across the Web

Use ANSI colors in the terminal - Windows CMD - SS64.com

Specify the color codes in a batch file by ECHOing the foreground and/or background color codes (from the following table) followed by the...

latest PDF - Scrapy Documentation

Scrapy (/skrepa/) is an application framework for crawling web sites and extracting structured data which can be used.

Packages for 64-bit Windows with Python 3.9

Name Version Summary / License _libgcc_mutex 0.1 Mutex for libgcc and libgcc‑ng / None aiofiles 0.7.0 File support for asyncio / Apache 2.0 alembic 1.8.1 A...

Packages included in Anaconda 5.1.0 for 64-bit Windows with ...

Name Version Summary / License anaconda‑clean 1.1.0 Delete Anaconda configuration files / BSD anyqt 0.0.8 PyQt4/PyQt5 compatibility layer. / GPL‑3.0 backports 1.0 / BSD

How to make win32 console recognize ANSI/VT100 escape ...

[UPDATE] For latest Windows 10 please read useful contribution by @brainslugs83, just below in the comments to this answer.