question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

response.css can't find elements that is there

See original GitHub issue

Hello,

When running scrapy shell https://www.belezanaweb.com.br/homens/

And executing a simple:

In [2]: response.css('div.item').extract()
Out[2]: []

The item is over there: rsz_2screenshot_at_2016-05-06_014842

It isn’t load using JS because it is visible in curl: curl https://www.belezanaweb.com.br/homens/ | less:

screenshot at 2016-05-06 01 53 45

Could it be a parser bug? Running on Scrapy1.1.0rc4 Thank you

Issue Analytics

  • State:closed
  • Created 7 years ago
  • Reactions:2
  • Comments:5 (2 by maintainers)

github_iconTop GitHub Comments

1reaction
eLRuLLcommented, May 6, 2016

this is a lxml problem for unclosed tags, but it could be fixed using something like BeautifulSoup:

In [10]: from BeautifulSoup import BeautifulSoup
In [11]: soup = BeautifulSoup(response.body)
In [12]: from scrapy import Selector
In [13]: sel = Selector(text=soup.prettify())
In [14]: sel.css('div.item')
Out[14]: 
[<Selector xpath=u"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' item ')]" data=u'<div class="item js-item item-variacao" '>,
 <Selector xpath=u"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' item ')]" data=u'<div class="item js-item item-variacao" '>,
 <Selector xpath=u"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' item ')]" data=u'<div class="item js-item " data-layer=\'{'>,
...
1reaction
rafaelcapuchocommented, May 6, 2016

The target HTML have some errors, like some unclosed div tags:

error

I download the pure html, fixed the tags and now scrapy response.css is processing fine. There are any configuration in the scrapy parser to ignore or auto close those unclosed tags?

Thank you

Read more comments on GitHub >

github_iconTop Results From Across the Web

Scrapy css selector can't find result - but browser can
In [1]: response.css('a[href*=". ... viewforum.php?f=18"]: Selects every <a> element whose href attribute value contains the substring ".
Read more >
Selectors — Scrapy 2.7.1 documentation
They're called selectors because they “select” certain parts of the HTML document specified either by XPath or CSS expressions.
Read more >
Scrapy - Selectors - GeeksforGeeks
get() # Here response object calls CSS selector method to # target HTML tag and get() method # is used to select everything...
Read more >
BeautifulSoup tutorial: Scraping web pages with Python
Open the developer tools ( F12 ) in Chrome or Firefox, select the document tab, and use Ctrl / ⌘ + F to...
Read more >
Document.querySelectorAll() - Web APIs | MDN
This string must be a valid CSS selector string; if it's not, a SyntaxError exception is thrown. See Locating DOM elements using selectors ......
Read more >

github_iconTop Related Medium Post

No results found

github_iconTop Related StackOverflow Question

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found