Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

response.css can't find elements that is there

See original GitHub issue

Hello,

When running scrapy shell https://www.belezanaweb.com.br/homens/

And executing a simple:

In [2]: response.css('div.item').extract()
Out[2]: []

The item is over there: rsz_2screenshot_at_2016-05-06_014842

It isn’t load using JS because it is visible in curl: curl https://www.belezanaweb.com.br/homens/ | less:

screenshot at 2016-05-06 01 53 45

Could it be a parser bug? Running on Scrapy1.1.0rc4 Thank you

Issue Analytics

State:
Created 7 years ago
Reactions:2
Comments:5 (2 by maintainers)

Top GitHub Comments

1reaction

eLRuLLcommented, May 6, 2016

this is a lxml problem for unclosed tags, but it could be fixed using something like BeautifulSoup:

In [10]: from BeautifulSoup import BeautifulSoup
In [11]: soup = BeautifulSoup(response.body)
In [12]: from scrapy import Selector
In [13]: sel = Selector(text=soup.prettify())
In [14]: sel.css('div.item')
Out[14]: 
[<Selector xpath=u"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' item ')]" data=u'<div class="item js-item item-variacao" '>,
 <Selector xpath=u"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' item ')]" data=u'<div class="item js-item item-variacao" '>,
 <Selector xpath=u"descendant-or-self::div[@class and contains(concat(' ', normalize-space(@class), ' '), ' item ')]" data=u'<div class="item js-item " data-layer=\'{'>,
...

1reaction

rafaelcapuchocommented, May 6, 2016

The target HTML have some errors, like some unclosed div tags:

error

I download the pure html, fixed the tags and now scrapy response.css is processing fine. There are any configuration in the scrapy parser to ignore or auto close those unclosed tags?

Thank you