2025, Nov 25 23:00

Fixing lxml XMLSchema validation on Linux when xml:specialAttrs fails to resolve using XML catalogs or local xml.xsd

Learn why lxml XMLSchema validation fails on Linux with libxml2 when xml:specialAttrs won't resolve, and fix it using an XML catalog or a local xml.xsd.

Fixing XMLSchema validation with lxml on Linux when xml:specialAttrs fails to resolve

Validating XML against an XSD can unexpectedly break across platforms. A common failure that surfaced with recent Linux stacks looks like this:

lxml.etree.XMLSchemaParseError: Element '{http://www.w3.org/2001/XMLSchema}attributeGroup', attribute 'ref': The QName value '{http://www.w3.org/XML/1998/namespace}specialAttrs' does not resolve to a(n) attribute group definition., line 15

The same code validates fine on Windows, but fails on Linux with lxml 6.x paired with newer libxml2. The scenario below reproduces the issue and shows what actually changed and how to make validation robust again.

Minimal setup that triggers the error

XML with an XInclude and a small XSLT is validated against a schema importing the XML namespace. The XML Schema import and the reference to xml:specialAttrs are the critical pieces.

main.xml

<?xml version="1.0" encoding="UTF-8"?>
<root xmlns:xi="http://www.w3.org/2001/XInclude">
    <title>Main XML</title>
    <elements>
        <element name="main element" foo="main foo">This text is from main.xml</element>
        <xi:include href="include.xml" parse="xml" xpointer="xpointer(/elements/element)"/>
    </elements>
</root>

include.xml

<?xml version="1.0" encoding="UTF-8"?>
<elements>
    <element name="element1" foo="foo1">Text 1: This content is included from another file.</element>
    <element name="element2" foo="foo2">Text 2: This content is included from another file.</element>
    <element name="element3" foo="foo3">Text 3: This content is included from another file.</element>
</elements>

transform.xslt

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:template match="@* | node()">
        <xsl:copy>
            <xsl:apply-templates select="@* | node()"/>
        </xsl:copy>
    </xsl:template>
    <xsl:template match="element[@name='element2']">
        <xsl:copy>
            <xsl:apply-templates select="@*"/>
            <xsl:attribute name="foo">spam</xsl:attribute>
            <xsl:attribute name="name">message99</xsl:attribute>
            <xsl:apply-templates select="node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

schema.xsd

<?xml version="1.0" encoding="UTF-8"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" elementFormDefault="qualified">
    <xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="http://www.w3.org/2009/01/xml.xsd"/>
    <xs:element name="root">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="title" type="xs:string"/>
                <xs:element name="elements">
                    <xs:complexType>
                        <xs:sequence minOccurs="1" maxOccurs="unbounded">
                            <xs:element name="element" minOccurs="1" maxOccurs="unbounded">
                                <xs:complexType mixed="true">
                                    <xs:attribute name="name" type="xs:string" use="required"/>
                                    <xs:attribute name="foo" type="xs:string" use="required"/>
                                    <xs:attributeGroup ref="xml:specialAttrs"/>
                                </xs:complexType>
                            </xs:element>
                        </xs:sequence>
                    </xs:complexType>
                </xs:element>
            </xs:sequence>
        </xs:complexType>
    </xs:element>
</xs:schema>

Python validation script

#!/usr/bin/env python3
import os
import lxml
from lxml import etree
print("Using lxml version {0}".format(lxml.__version__), end="\n\n")
xml_doc = etree.parse("main.xml")
xml_doc.xinclude()
if os.path.isfile("transform.xslt"):
    print("Applying transformation from transform.xslt")
    style_tree = etree.parse("transform.xslt")
    xslt_proc = etree.XSLT(style_tree)
    transformed = xslt_proc(xml_doc)
    xml_doc._setroot(transformed.getroot())
print(etree.tostring(xml_doc, pretty_print=True).decode())
xsd_obj = etree.XMLSchema(etree.parse("schema.xsd"))
if xsd_obj.validate(xml_doc):
    print("XML is valid.")
else:
    print("XML is invalid!")
    for entry in xsd_obj.error_log:
        print(entry.message)

On Linux with lxml 6.x paired with newer libxml2, the run ends with the XMLSchemaParseError above. On Windows the same project validates.

What is actually going on

Recent libxml2 releases enforce resolving external resources through XML catalogs for security. When the schema imports the XML namespace schema and expects to fetch it from the network, the resolver can refuse to load it. As a result, the schema component that defines xml:specialAttrs is not available and the QName cannot be resolved, which triggers the validation schema compilation error. The behavior difference between platforms matches the library versions in use: newer libxml2 on Linux enforces the policy, while a Windows setup with older libxml2 validates. The same failure can be reproduced with the command-line xmllint from the latest libxml2, which rules out this being an lxml-only issue.

The right fix: use an XML catalog

Configure a local XML catalog that maps the external schema URI to a local file, and point the parser to that catalog. Place a local copy of the XML namespace schema next to the catalog.

Download the schema once

wget "http://www.w3.org/2001/xml.xsd"

Create catalog.xml

<?xml version="1.0"?>
<!DOCTYPE catalog PUBLIC "-//OASIS//DTD XML Catalogs V1.0//EN"
                      "http://www.oasis-open.org/committees/entity/release/1.0/catalog.dtd">
<catalog xmlns="urn:oasis:names:tc:entity:xmlns:xml:catalog">
  <public publicId="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
  <system systemId="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
  <uri name="http://www.w3.org/2001/xml.xsd"
          uri="xml.xsd"/>
</catalog>

Point lxml to the catalog and validate

import os
import lxml
from lxml import etree
os.environ["XML_CATALOG_FILES"] = "catalog.xml"
print("Using lxml version {0}".format(lxml.__version__), end="\n\n")
xsd_tree = etree.parse("schema.xsd")
compiled_xsd = etree.XMLSchema(etree=xsd_tree)
src_tree = etree.parse("main.xml")
src_tree.xinclude()
if os.path.isfile("transform.xslt"):
    print("Applying transformation from transform.xslt")
    style_tree = etree.parse("transform.xslt")
    xslt_runner = etree.XSLT(style_tree)
    out_doc = xslt_runner(src_tree)
    src_tree._setroot(out_doc.getroot())
print(etree.tostring(src_tree, pretty_print=True).decode())
if compiled_xsd.validate(src_tree):
    print("XML is valid.")
else:
    print("XML is invalid!")
    for issue in compiled_xsd.error_log:
        print(issue.message)

This makes resolution deterministic and works with modern libxml2. The same catalog can be verified with xmllint, for example:

XML_CATALOG_FILES='catalog.xml' /home/lmc/Downloads/libxml2-v2.15.0/xmllint --noout --xinclude --schema schema.xsd main.xml 
main.xml validates

Alternative: reference a local copy of xml.xsd in the schema

Another practical option is to vendor the XML namespace schema and point the import to the local file. Download once:

wget "http://www.w3.org/2001/xml.xsd"

Then change the import in schema.xsd to use the local path:

<xs:import namespace="http://www.w3.org/XML/1998/namespace" schemaLocation="xml.xsd"/>

With that change, xmllint and lxml validate successfully without network access.

Why this matters

Schema validation that depends on fetching remote resources is fragile. Newer libxml2 tightened the rules, and projects that used to work now fail when the XML namespace schema cannot be resolved remotely. Catalog-based resolution and local imports remove the network dependency, bring consistent behavior across Linux, macOS, and Windows, and align with secure-by-default parsers.

Takeaways

If validation breaks with an error about xml:specialAttrs not resolving, make external resolution explicit. Use an XML catalog that maps the XML namespace schema URI to a local file, or import a local copy directly from your XSD. You can confirm the environment by printing the library versions and by testing with xmllint; the same failure there indicates it’s a libxml2-level change, not application code. Once resolution is local and deterministic, XInclude, XSLT application, and schema validation work as expected.