question-mark
Stuck on an issue?

Lightrun Answers was designed to reduce the constant googling that comes with debugging 3rd party libraries. It collects links to all the places you might be looking at while hunting down a tough bug.

And, if you’re still stuck at the end, we’re happy to hop on a call to see how we can help out.

New selector method: extract_first()

See original GitHub issue

I think about suggestion to improve scrapy Selector. I’ve seen this construction in many projects:

result = sel.xpath('//div/text()').extract()[0]

And what about if result: and else:, or try: and except:, which should be always there? When we don’t want ItemLoaders, the most common use of selector is retrieving only single element. Maybe there should be method xpath1 or xpath_one or xpath_first that returns first matched element or None?

Issue Analytics

  • State:closed
  • Created 10 years ago
  • Reactions:1
  • Comments:54 (41 by maintainers)

github_iconTop GitHub Comments

1reaction
shirk3ycommented, Jan 30, 2014

It also could be like this:

sel.css('span').extract_first()

Maybe it will prevent constructions that aren’t clear much, eg.:

sel.css('span').extract(True)

2014-01-30 Daniel Graña notifications@github.com

I think the problem is not in the selecting methods .xpath() and .css()but on .extract(), we can add a parameter to get the first result.

@darkrho https://github.com/darkrho: in #569https://github.com/scrapy/scrapy/pull/569, you propose .get() but it doesn’t extract, instead it returns the first element of a SelectorList that need to call .extract() to get the desired result on this issue:

sel.css(‘span’).get(0).extract()

u’foo’

I would go for something like this:

sel.css(‘span’).extract(first=True) u’<span>foo</span>’

I think the case is always to get the first element, indexing can still be addressed with xpath or css methods.

sel.css(‘span’).extract(takefirst=True) u’<span>foo</span>’

— Reply to this email directly or view it on GitHubhttps://github.com/scrapy/scrapy/issues/568#issuecomment-33690050 .

0reactions
curitacommented, Mar 19, 2015

Closed by ff64584

Read more comments on GitHub >

github_iconTop Results From Across the Web

Selectors — Scrapy 2.7.1 documentation
Selectors ¶. When you're scraping web pages, the most common task you need to perform is to extract data from the HTML source....
Read more >
Extract first element with XPath and scrapy
There is a new Scrapy built in method get() can be used instead of extract_first() which always returns a string and None if...
Read more >
Selectors - Scrapy documentation - Read the Docs
Scrapy comes with its own mechanism for extracting data. They're called selectors because they “select” certain parts of the HTML document specified either...
Read more >
Scrapy - Extracting Items
Scrapy - Extracting Items, For extracting data from web pages, Scrapy uses a technique called selectors based on XPath and CSS expressions.
Read more >
Selectors
Selector also has a .re() method for extracting data using regular expressions. However, unlike using .xpath() and .css() methods, .re() returns ...
Read more >

github_iconTop Related Medium Post

No results found

github_iconTroubleshoot Live Code

Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start Free

github_iconTop Related Reddit Thread

No results found

github_iconTop Related Hackernoon Post

No results found

github_iconTop Related Tweet

No results found

github_iconTop Related Dev.to Post

No results found

github_iconTop Related Hashnode Post

No results found