question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scrapy parse emits ANSI color sequences in the Windows terminal

See original GitHub issue

Description

i try to use scrapy parse command in cmd(anaconda env),but when it logs Scraped Items and Requests, there are full of garbled code which i show you below(Additional context). I have no idea of it, but it seems to have nothing to do with character encoding,

Steps to Reproduce

running command

>>> scrapy parse --spider=quotes3 http://quotes.toscrape.com/page/1/

the quotes3 spider shows below(Additional context)

Expected behavior: without garbled code

Actual behavior: garbled code appears

Reproduces how often: every time

Versions

Scrapy : 1.8.0 lxml : 4.4.1.0 libxml2 : 2.9.9 cssselect : 1.1.0 parsel : 1.5.2 w3lib : 1.21.0 Twisted : 19.10.0 Python : 3.7.4 (default, Aug 9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)] pyOpenSSL : 19.0.0 (OpenSSL 1.1.1d 10 Sep 2019) cryptography : 2.7 Platform : Windows-10-10.0.18362-SP0

Additional context

spider.py

class QuotesSpider3(scrapy.Spider):
    name = 'quotes3'
    start_urls = ['http://quotes.toscrape.com/page/1/']

    def parse(self, response):
        for quote in response.xpath('//div[@class="quote"]'):
            yield {
                'text':
                quote.xpath('span[@class="text"]/text()').get(),
                'author':
                quote.xpath('.//small[@class="author"]/text()').get(),
                'tags':
                quote.xpath(
                    'div[@class="tags"]//a[@class="tag"]/text()').getall()
            }

        next_page = response.xpath('//li[@class="next"]//a/@href').get()
        if next_page is not None:
            next_page = response.urljoin(next_page)
            yield scrapy.Request(next_page, callback=self.parse)
(base) D:\scrapy_project>chcp
Active code page: 65001

(base) D:\scrapy_project>scrapy parse --spider=quotes3 http://quotes.toscrape.com/page/1/
2020-03-02 18:27:36 [scrapy.utils.log] INFO: Scrapy 1.8.0 started (bot: demo1)
2020-03-02 18:27:36 [scrapy.utils.log] INFO: Versions: lxml 4.4.1.0, libxml2 2.9.9, cssselect 1.1.0, parsel 1.5.2, w3lib 1.21.0, Twisted 19.10.0, Python 3.7.4 (default, Aug  9 2019, 18:22:51) [MSC v.1915 32 bit (Intel)], pyOpenSSL 19.0.0 (OpenSSL 1.1.1d  10 Sep 2019), cryptography 2.7, Platform Windows-10-10.0.18362-SP0
2020-03-02 18:27:36 [scrapy.crawler] INFO: Overridden settings: {'BOT_NAME': 'demo1', 'NEWSPIDER_MODULE': 'demo1.spiders', 'ROBOTSTXT_OBEY': True, 'SPIDER_MODULES': ['demo1.spiders']}
2020-03-02 18:27:36 [scrapy.extensions.telnet] INFO: Telnet Password: 67f8ee92329c4e99
2020-03-02 18:27:37 [scrapy.middleware] INFO: Enabled extensions:
['scrapy.extensions.corestats.CoreStats',
 'scrapy.extensions.telnet.TelnetConsole',
 'scrapy.extensions.logstats.LogStats']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled downloader middlewares:
['scrapy.downloadermiddlewares.robotstxt.RobotsTxtMiddleware',
 'scrapy.downloadermiddlewares.httpauth.HttpAuthMiddleware',
 'scrapy.downloadermiddlewares.downloadtimeout.DownloadTimeoutMiddleware',
 'scrapy.downloadermiddlewares.defaultheaders.DefaultHeadersMiddleware',
 'scrapy.downloadermiddlewares.useragent.UserAgentMiddleware',
 'scrapy.downloadermiddlewares.retry.RetryMiddleware',
 'scrapy.downloadermiddlewares.redirect.MetaRefreshMiddleware',
 'scrapy.downloadermiddlewares.httpcompression.HttpCompressionMiddleware',
 'scrapy.downloadermiddlewares.redirect.RedirectMiddleware',
 'scrapy.downloadermiddlewares.cookies.CookiesMiddleware',
 'scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware',
 'scrapy.downloadermiddlewares.stats.DownloaderStats']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled spider middlewares:
['scrapy.spidermiddlewares.httperror.HttpErrorMiddleware',
 'scrapy.spidermiddlewares.offsite.OffsiteMiddleware',
 'scrapy.spidermiddlewares.referer.RefererMiddleware',
 'scrapy.spidermiddlewares.urllength.UrlLengthMiddleware',
 'scrapy.spidermiddlewares.depth.DepthMiddleware']
2020-03-02 18:27:39 [scrapy.middleware] INFO: Enabled item pipelines:
[]
2020-03-02 18:27:39 [scrapy.core.engine] INFO: Spider opened
2020-03-02 18:27:39 [scrapy.extensions.logstats] INFO: Crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min)
2020-03-02 18:27:39 [scrapy.extensions.telnet] INFO: Telnet console listening on 127.0.0.1:6023
2020-03-02 18:27:44 [scrapy.core.engine] DEBUG: Crawled (404) <GET http://quotes.toscrape.com/robots.txt> (referer: None)
2020-03-02 18:27:51 [scrapy.core.engine] DEBUG: Crawled (200) <GET http://quotes.toscrape.com/page/1/> (referer: None)
2020-03-02 18:27:52 [scrapy.core.engine] INFO: Closing spider (finished)
2020-03-02 18:27:52 [scrapy.statscollectors] INFO: Dumping Scrapy stats:
{'downloader/request_bytes': 453,
 'downloader/request_count': 2,
 'downloader/request_method_count/GET': 2,
 'downloader/response_bytes': 2719,
 'downloader/response_count': 2,
 'downloader/response_status_count/200': 1,
 'downloader/response_status_count/404': 1,
 'elapsed_time_seconds': 12.789971,
 'finish_reason': 'finished',
 'finish_time': datetime.datetime(2020, 3, 2, 10, 27, 52, 44220),
 'log_count/DEBUG': 2,
 'log_count/INFO': 10,
 'response_received_count': 2,
 'robotstxt/request_count': 1,
 'robotstxt/response_count': 1,
 'robotstxt/response_status_count/404': 1,
 'scheduler/dequeued': 1,
 'scheduler/dequeued/memory': 1,
 'scheduler/enqueued': 1,
 'scheduler/enqueued/memory': 1,
 'start_time': datetime.datetime(2020, 3, 2, 10, 27, 39, 254249)}
2020-03-02 18:27:52 [scrapy.core.engine] INFO: Spider closed (finished)

>>> STATUS DEPTH LEVEL 1 <<<
# Scraped Items  ------------------------------------------------------------
[{'author': 'Albert Einstein',
  'tags': ['change', 'deep-thoughts', 'thinking', 'world'],
  'text': '“The world as we have created it is a process of our thinking. It '
          'cannot be changed without changing our thinking.”'},
 {'author': 'J.K. Rowling',
  'tags': ['abilities', 'choices'],
  'text': '“It is our choices, Harry, that show what we truly are, far more '
          'than our abilities.”'},
 {'author': 'Albert Einstein',
  'tags': ['inspirational', 'life', 'live', 'miracle', 'miracles'],
  'text': '“There are only two ways to live your life. One is as though '
          'nothing is a miracle. The other is as though everything is a '
          'miracle.”'},
 {'author': 'Jane Austen',
  'tags': ['aliteracy', 'books', 'classic', 'humor'],
  'text': '“The person, be it gentleman or lady, who has not pleasure in a '
          'good novel, must be intolerably stupid.”'},
 {'author': 'Marilyn Monroe',
  'tags': ['be-yourself', 'inspirational'],
  'text': "“Imperfection is beauty, madness is genius and it's better to be "
          'absolutely ridiculous than absolutely boring.”'},
 {'author': 'Albert Einstein',
  'tags': ['adulthood', 'success', 'value'],
  'text': '“Try not to become a man of success. Rather become a man of '
          'value.”'},
 {'author': 'André Gide',
  'tags': ['life', 'love'],
  'text': '“It is better to be hated for what you are than to be loved for '
          'what you are not.”'},
 {'author': 'Thomas A. Edison',
  'tags': ['edison', 'failure', 'inspirational', 'paraphrased'],
  'text': "“I have not failed. I've just found 10,000 ways that won't work.”"},
 {'author': 'Eleanor Roosevelt',
  'tags': ['misattributed-eleanor-roosevelt'],
  'text': '“A woman is like a tea bag; you never know how strong it is until '
          "it's in hot water.”"},
 {'author': 'Steve Martin',
  'tags': ['humor', 'obvious', 'simile'],
  'text': '“A day without sunshine is like, you know, night.”'}]

# Requests  -----------------------------------------------------------------
[<GET http://quotes.toscrape.com/page/2/>]

Issue Analytics

  • State:closed
  • Created 4 years ago
  • Reactions:1
  • Comments:8 (7 by maintainers)

github_iconTop GitHub Comments

1reaction
akshaysharmajscommented, Jul 27, 2020

Should we close this issue?

1reaction
wRARcommented, Mar 2, 2020

Pass --nocolour. While Pygments seems to have some support for the Windows terminal, Scrapy uses TerminalFormatter which is specifically for ANSI sequences. This is probably a bug in Scrapy.

Read more comments on GitHub >

github_iconTop Results From Across the Web

Use ANSI colors in the terminal - Windows CMD - SS64.com
Specify the color codes in a batch file by ECHOing the foreground and/or background color codes (from the following table) followed by the...
Read more >
latest PDF - Scrapy Documentation
Scrapy (/skrepa/) is an application framework for crawling web sites and extracting structured data which can be used.
Read more >
Packages for 64-bit Windows with Python 3.9
Name Version Summary / License _libgcc_mutex 0.1 Mutex for libgcc and libgcc‑ng / None aiofiles 0.7.0 File support for asyncio / Apache 2.0 alembic 1.8.1 A...
Read more >
Packages included in Anaconda 5.1.0 for 64-bit Windows with ...
Name Version Summary / License anaconda‑clean 1.1.0 Delete Anaconda configuration files / BSD anyqt 0.0.8 PyQt4/PyQt5 compatibility layer. / GPL‑3.0 backports 1.0 / BSD
Read more >
How to make win32 console recognize ANSI/VT100 escape ...
[UPDATE] For latest Windows 10 please read useful contribution by @brainslugs83, just below in the comments to this answer.
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found