Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New selector method: extract_first()

See original GitHub issue

I think about suggestion to improve scrapy Selector. I’ve seen this construction in many projects:

result = sel.xpath('//div/text()').extract()[0]

And what about if result: and else:, or try: and except:, which should be always there? When we don’t want ItemLoaders, the most common use of selector is retrieving only single element. Maybe there should be method xpath1 or xpath_one or xpath_first that returns first matched element or None?

Issue Analytics

State:
Created 10 years ago
Reactions:1
Comments:54 (41 by maintainers)

Top GitHub Comments

1reaction

shirk3ycommented, Jan 30, 2014

It also could be like this:

sel.css('span').extract_first()

Maybe it will prevent constructions that aren’t clear much, eg.:

sel.css('span').extract(True)

2014-01-30 Daniel Graña notifications@github.com

I think the problem is not in the selecting methods .xpath() and .css()but on .extract(), we can add a parameter to get the first result.

@darkrho https://github.com/darkrho: in #569 https://github.com/scrapy/scrapy/pull/569, you propose .get() but it doesn’t extract, instead it returns the first element of a SelectorList that need to call .extract() to get the desired result on this issue:

sel.css(‘span’).get(0).extract()

u’foo’

I would go for something like this:

sel.css(‘span’).extract(first=True) u’<span>foo</span>’

I think the case is always to get the first element, indexing can still be addressed with xpath or css methods.

sel.css(‘span’).extract(takefirst=True) u’<span>foo</span>’

— Reply to this email directly or view it on GitHubhttps://github.com/scrapy/scrapy/issues/568#issuecomment-33690050 .

0reactions

curitacommented, Mar 19, 2015

Closed by ff64584

Top Results From Across the Web

Selectors — Scrapy 2.7.1 documentation

Selectors ¶. When you're scraping web pages, the most common task you need to perform is to extract data from the HTML source....

Extract first element with XPath and scrapy

There is a new Scrapy built in method get() can be used instead of extract_first() which always returns a string and None if...

Selectors - Scrapy documentation - Read the Docs

Scrapy comes with its own mechanism for extracting data. They're called selectors because they “select” certain parts of the HTML document specified either...

Scrapy - Extracting Items

Scrapy - Extracting Items, For extracting data from web pages, Scrapy uses a technique called selectors based on XPath and CSS expressions.

Selectors

Selector also has a .re() method for extracting data using regular expressions. However, unlike using .xpath() and .css() methods, .re() returns ...