https://pytroubles.com/en/posts/id1359-lxml-xpath-vs-find-findall-findtext-elementpath-limitations-common-pitfalls-and-fixes

lxml xpath() vs find()/findall()/findtext(): ElementPath limitations, common pitfalls, and fixes

lxml ElementPath vs XPath: when to use find()/findall()/findtext() and when to switch to xpath()

lxml xpath() vs find()/findall()/findtext(): ElementPath limitations, common pitfalls, and fixes

Learn the real difference between lxml xpath() and find()/findall()/findtext(): ElementPath limits vs full XPath, errors using text(), and correct usage patterns.

2025-10-29T05:00:07+03:00

2025-10-29T05:00:08+03:00

When working with lxml, it’s tempting to reach for xpath() by default. It’s powerful and familiar. But if you’ve noticed that some calls to xpath() could be replaced with findall(), you’re not wrong — the methods overlap in intent but not in capability. Understanding where they differ helps you choose the right tool and avoid subtle errors.What’s the actual difference?lxml.etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath). As an lxml specific extension, these classes also provide an xpath() method that supports expressions in the complete XPath syntax, as well as custom extension functions.ElementPathThe ElementTree library comes with a simple XPath-like path language called ElementPath. The main difference is that you can use the {namespace}tag notation in ElementPath expressions. However, advanced features like value comparison and functions are not available.Element.findall() finds only elements with a tag which are direct children of the current element.Element.find() finds the first child with a particular tag, and Element.text accesses the element’s text content.This means that the first argument to findall() is not a full XPath expression. It is a simple ElementPath string, good for straightforward, structural navigation among direct children. In contrast, xpath() evaluates a full XPath expression against the context node.Problem example: using findall() as if it understood full XPathThe following snippet tries to fetch text nodes from descendant td elements using findall(). This is where things go wrong: ElementPath doesn’t support functions like text().from lxml import etree xml_blob = "<table><tr><td>One</td><td>Two</td></tr></table>" root_node = etree.fromstring(xml_blob) # This uses a function (text()) and assumes full XPath support. # ElementPath can't handle it and this call fails. cell_texts = root_node.findall(".//td/text()") The failure is not an lxml bug. It’s a mismatch between what ElementPath supports and what a full XPath engine provides. As noted above, ElementPath does not implement advanced features like functions.Why it fails: ElementPath vs full XPathfind(), findall(), and findtext() rely on a simple, XPath-like language called ElementPath. Its scope is intentionally limited. The core practical limitation is that these methods are meant for structural navigation to direct children, not for expression-level features like functions, value comparisons, or complex axes. Meanwhile, xpath() runs a complete XPath expression, including functions such as text().The takeaway is straightforward: calls that need full XPath capabilities must use xpath(). Calls that only traverse direct children by tag can use find*().The fix: use xpath() for full expressions, use findall() for direct childrenTo extract text nodes from descendant td elements, switch to xpath().from lxml import etree xml_payload = "<table><tr><td>One</td><td>Two</td></tr></table>" doc_root = etree.fromstring(xml_payload) # Full XPath expression: works for functions and descendants texts = doc_root.xpath(".//td/text()") # texts == ["One", "Two"] On the other hand, if you only need to walk direct children by tag name, stick with findall(). This aligns with ElementPath’s intended use.from lxml import etree snippet = "<table><tr><td>One</td><td>Two</td></tr></table>" root_el = etree.fromstring(snippet) # Direct children selection via ElementPath rows = root_el.findall("tr") first_row_cells = rows[0].findall("td") values = [cell.text for cell in first_row_cells] # values == ["One", "Two"] What the path argument to findall() really isThe path parameter is an ElementPath string. It is a “simple path syntax” as described above, not a full XPath expression. That’s why constructs like text() do not work there, while they work in xpath().Why this distinction mattersUsing the right method for the right level of expressiveness helps avoid brittle code and unexplained errors. If you rely on full XPath features, reach for xpath(). If the need is to pick direct children by tag and keep queries straightforward, find*() is a good fit. Keeping this boundary in mind leads to cleaner, more predictable XML handling code.ConclusionIn lxml, find(), findall(), and findtext() implement ElementPath: a simple, limited path language geared toward direct children and basic structural traversal. xpath() evaluates complete XPath expressions with advanced features, including functions. Use find*() when selecting direct children by tag, and switch to xpath() when you need full XPath power, such as functions or more complex queries.

lxml, xpath(), ElementPath, find(), findall(), findtext(), XPath vs ElementPath, lxml xpath vs findall, text() function, XML parsing, Python lxml, common errors, best practices

2025

2025, Oct 29 05:00

lxml ElementPath vs XPath: when to use find()/findall()/findtext() and when to switch to xpath()

Learn the real difference between lxml xpath() and find()/findall()/findtext(): ElementPath limits vs full XPath, errors using text(), and correct usage patterns.

What’s the actual difference?

lxml.etree supports the simple path syntax of the find, findall and findtext methods on ElementTree and Element, as known from the original ElementTree library (ElementPath). As an lxml specific extension, these classes also provide an xpath() method that supports expressions in the complete XPath syntax, as well as custom extension functions.

ElementPath
The ElementTree library comes with a simple XPath-like path language called ElementPath. The main difference is that you can use the {namespace}tag notation in ElementPath expressions. However, advanced features like value comparison and functions are not available.

Element.findall() finds only elements with a tag which are direct children of the current element.
Element.find() finds the first child with a particular tag, and Element.text accesses the element’s text content.

This means that the first argument to findall() is not a full XPath expression. It is a simple ElementPath string, good for straightforward, structural navigation among direct children. In contrast, xpath() evaluates a full XPath expression against the context node.

Problem example: using findall() as if it understood full XPath

The following snippet tries to fetch text nodes from descendant td elements using findall(). This is where things go wrong: ElementPath doesn’t support functions like text().

from lxml import etree
xml_blob = "<table><tr><td>One</td><td>Two</td></tr></table>"
root_node = etree.fromstring(xml_blob)
# This uses a function (text()) and assumes full XPath support.
# ElementPath can't handle it and this call fails.
cell_texts = root_node.findall(".//td/text()")

The failure is not an lxml bug. It’s a mismatch between what ElementPath supports and what a full XPath engine provides. As noted above, ElementPath does not implement advanced features like functions.

Why it fails: ElementPath vs full XPath

find(), findall(), and findtext() rely on a simple, XPath-like language called ElementPath. Its scope is intentionally limited. The core practical limitation is that these methods are meant for structural navigation to direct children, not for expression-level features like functions, value comparisons, or complex axes. Meanwhile, xpath() runs a complete XPath expression, including functions such as text().

The takeaway is straightforward: calls that need full XPath capabilities must use xpath(). Calls that only traverse direct children by tag can use find*().

The fix: use xpath() for full expressions, use findall() for direct children

To extract text nodes from descendant td elements, switch to xpath().

from lxml import etree
xml_payload = "<table><tr><td>One</td><td>Two</td></tr></table>"
doc_root = etree.fromstring(xml_payload)
# Full XPath expression: works for functions and descendants
texts = doc_root.xpath(".//td/text()")
# texts == ["One", "Two"]

On the other hand, if you only need to walk direct children by tag name, stick with findall(). This aligns with ElementPath’s intended use.

from lxml import etree
snippet = "<table><tr><td>One</td><td>Two</td></tr></table>"
root_el = etree.fromstring(snippet)
# Direct children selection via ElementPath
rows = root_el.findall("tr")
first_row_cells = rows[0].findall("td")
values = [cell.text for cell in first_row_cells]
# values == ["One", "Two"]

What the path argument to findall() really is

The path parameter is an ElementPath string. It is a “simple path syntax” as described above, not a full XPath expression. That’s why constructs like text() do not work there, while they work in xpath().

Why this distinction matters

Using the right method for the right level of expressiveness helps avoid brittle code and unexplained errors. If you rely on full XPath features, reach for xpath(). If the need is to pick direct children by tag and keep queries straightforward, find*() is a good fit. Keeping this boundary in mind leads to cleaner, more predictable XML handling code.

Conclusion

In lxml, find(), findall(), and findtext() implement ElementPath: a simple, limited path language geared toward direct children and basic structural traversal. xpath() evaluates complete XPath expressions with advanced features, including functions. Use find*() when selecting direct children by tag, and switch to xpath() when you need full XPath power, such as functions or more complex queries.

The article is based on a question from StackOverflow by Moberg and an answer by LMC.

lxml python