response.css can't find elements that is there
See original GitHub issueHello,
When running
scrapy shell https://www.belezanaweb.com.br/homens/
And executing a simple:
In [2]: response.css('div.item').extract()
Out[2]: []
The item is over there:
It isn’t load using JS because it is visible in curl:
curl https://www.belezanaweb.com.br/homens/ | less
:
Could it be a parser bug? Running on Scrapy1.1.0rc4 Thank you
Issue Analytics
- State:
- Created 7 years ago
- Reactions:2
- Comments:5 (2 by maintainers)
Top Results From Across the Web
Scrapy css selector can't find result - but browser can
In [1]: response.css('a[href*=". ... viewforum.php?f=18"]: Selects every <a> element whose href attribute value contains the substring ".
Read more >Selectors — Scrapy 2.7.1 documentation
They're called selectors because they “select” certain parts of the HTML document specified either by XPath or CSS expressions.
Read more >Scrapy - Selectors - GeeksforGeeks
get() # Here response object calls CSS selector method to # target HTML tag and get() method # is used to select everything...
Read more >BeautifulSoup tutorial: Scraping web pages with Python
Open the developer tools ( F12 ) in Chrome or Firefox, select the document tab, and use Ctrl / ⌘ + F to...
Read more >Document.querySelectorAll() - Web APIs | MDN
This string must be a valid CSS selector string; if it's not, a SyntaxError exception is thrown. See Locating DOM elements using selectors ......
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
this is a lxml problem for unclosed tags, but it could be fixed using something like
BeautifulSoup
:The target HTML have some errors, like some unclosed
div
tags:I download the pure html, fixed the tags and now scrapy
response.css
is processing fine. There are any configuration in the scrapy parser to ignore or auto close those unclosed tags?Thank you