custom xpath support
See original GitHub issuecustom xpath functions could be added here? like:
# Original Source: https://gist.github.com/shirk3y/458224083ce5464627bc
from lxml import etree
CLASS_EXPR = "contains(concat(' ', normalize-space(@class), ' '), ' {} ')"
def has_class(context, *classes):
"""
This lxml extension allows to select by CSS class more easily
>>> ns = etree.FunctionNamespace(None)
>>> ns['has-class'] = has_class
>>> root = etree.XML('''
... <a>
... <b class="one first text">I</b>
... <b class="two text">LOVE</b>
... <b class="three text">CSS</b>
... </a>
... ''')
>>> len(root.xpath('//b[has-class("text")]'))
3
>>> len(root.xpath('//b[has-class("one")]'))
1
>>> len(root.xpath('//b[has-class("text", "first")]'))
1
>>> len(root.xpath('//b[not(has-class("first"))]'))
2
>>> len(root.xpath('//b[has-class("not-exists")]'))
0
"""
expressions = ' and '.join([CLASS_EXPR.format(c) for c in classes])
xpath = 'self::*[@class and {}]'.format(expressions)
return bool(context.context_node.xpath(xpath))
I think it is a common practice to create custom xpaths on different projects.
Issue Analytics
- State:
- Created 8 years ago
- Reactions:1
- Comments:8 (8 by maintainers)
Top Results From Across the Web
23 Creating and Using Custom XPath Functions
This chapter describes how to create, register, and use custom XPath functions in XQuery expressions within Oracle Service Bus.
Read more >Custom XPath - IBM
You can use the Custom XPath transform to provide a data value for a simple target element, or values for a repeating simple...
Read more >How to Write Effective XPaths in Selenium with Examples?
XPath Example : Usage of XPath functions and Axes in Selenium ... 'custom-control custom-radio custom-control-inline']/descendant::input.
Read more >XPath in Selenium: How to Find & Write? (Text, Contains, AND)
In this example, we tried to identify the element by just using partial text value of the attribute. In the below XPath expression...
Read more >Custom XPath functions - W3C XForms Group Wiki (Public)
support a simple syntax, which can later be expanded (e.g. with something closer from XSLT 2's sequence constructors). can be defined and used...
Read more >Top Related Medium Post
No results found
Top Related StackOverflow Question
No results found
Troubleshoot Live Code
Lightrun enables developers to add logs, metrics and snapshots to live code - no restarts or redeploys required.
Start FreeTop Related Reddit Thread
No results found
Top Related Hackernoon Post
No results found
Top Related Tweet
No results found
Top Related Dev.to Post
No results found
Top Related Hashnode Post
No results found
Top GitHub Comments
I’m closing this ticket, as parsel has
has_class
function built-in now, and provides a simplified way to register custom XPath functions (via parsel.xpathfuncs.set_xpathfunc) - see http://parsel.readthedocs.io/en/latest/usage.html#other-xpath-extensions.So I looked at this today and wanted to “benchmark” different implementations.
I compared:
has-class
using an XPath call within the Python function (https://github.com/scrapy/parsel/issues/13#issue-100686360)set
comparisons (https://github.com/scrapy/scrapy/issues/753#issuecomment-51502883)set
but for 1 class only (when it makes sense)I used this script with the homepage of the New York Times, directly on the lxml-parsed document:
And this is what I get:
So there seems to be always a non-negligible penalty using custom XPath/Python functions. Using
cssselect
translation to XPath looks faster in all cases. Can someone else double check this?