question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

scrapy shell - bug in processing escaped URLs

See original GitHub issue

For eg: URL - https://altoona.craigslist.org/search/sss?query=cars&sort=rel&searchNearby=1 when pasted on the command line (shell-specific) creates a few escaping characters - backslashes

The following command gives a 404:

$ scrapy shell 'https://altoona.craigslist.org/search/sss\?query\=cars\&sort\=rel\&searchNearby\=1'

I saw a similar discussion in #1232 - where urllib.parse.unquote() was used to fix it.

In this case even unquote doesn’t seem to work.

>>> unquote('https://altoona.craigslist.org/search/sss\?query\=cars\&sort\=rel\&searchNearby\=1')
'https://altoona.craigslist.org/search/sss\\?query\\=cars\\&sort\\=rel\\&searchNearby\\=1'

Even though URL-character escaping is an issue specific to the shell, wouldn’t it be awesome if Scrapy processes such URLs automatically?

Issue Analytics

  • State:open
  • Created 7 years ago
  • Comments:6 (6 by maintainers)

github_iconTop GitHub Comments

1reaction
harshasrinivascommented, Mar 6, 2017

Sure, I’ll send a PR right away.

0reactions
harshasrinivascommented, Mar 8, 2017

Hello @rolando , I tried it on oh-my-zsh and basic zsh as well. Opening double-quotes beforehand does not seem to help. (Tested it on both iTerm2 and Terminal)

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy shell — Scrapy 2.7.1 documentation
The Scrapy shell is an interactive shell where you can try and debug your scraping code very quickly, without having to run the...
Read more >
Scrapy shell Error - python 2.7 - Stack Overflow
Launching Scrapy with the shell argument from the command will still use the configuration and the associated settings file. By default, Scrapy will...
Read more >
Scrapy Documentation - Read the Docs
scrapy shell 'http://quotes.toscrape.com/page/1/' ... Keep in mind that this attribute contains the escaped URL, so.
Read more >
Release notes — Scrapy 2.4.0 documentation
The processing of ANSI escape sequences in enabled in Windows 10.0.14393 and ... scrapy fetch url , scrapy shell url and fetch(url) inside...
Read more >
Bug listing with status RESOLVED with resolution OBSOLETE ...
Bug :1523 - "[IDEA] Offload work by distributing trivial ebuild maintenance to users, ... Bug:165023 - "dev-python/pygtk-shell - new package" status:RESOLVED ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found