2025, Sep 24 01:00

Designing Pydantic models that accept either an ID or an object, enforce XOR, and load lazily

Build Pydantic models that accept either an ID or an object, enforce XOR validation, and keep I/O lazy with caching for clean, type-safe, efficient Python code

Building a model that accepts exactly one of two inputs and computes the missing piece later sounds simple until type checkers and validation rules get in the way. The goal is clear: accept either an ID or the full object, enforce that only one is provided at a time, and keep the expensive fetch lazy so I/O happens only when needed.

Problem setup

The initial approach relies on private attributes and computed properties. It looks workable, but static typing complains and the XOR rule between the two inputs is awkward to enforce.

from pydantic import BaseModel, model_validator
from typing import Optional

class Asset:
    ident: int

def load_asset_by_id(ident: int) -> Asset:
    ...

class AssetEnvelope(BaseModel):
    _asset_id: Optional[int] = None
    _asset: Optional[Asset] = None
    
    @model_validator(mode="before")
    @classmethod
    def ensure_reference(cls, payload):
        if payload.get("_asset_id") is None and payload.get("_asset") is None:
            raise ValueError("Define either _asset_id or _asset")

    @property
    def asset_id(self) -> int:
        if self._asset_id is None:
            self._asset_id = self.asset.ident
        return self._asset_id  # type checker: might still be None
    
    @asset_id.setter
    def asset_id(self, ident: int):
        self._asset_id = ident
    
    @property
    def asset(self) -> Asset:
        if self._asset is None:
            self._asset = load_asset_by_id(self.asset_id)
        return self._asset  # type checker: might still be None
        
    @asset.setter
    def asset(self, obj: Asset):
        self._asset = obj

AssetEnvelope(_asset_id=5)

What goes wrong and why

There are two intertwined issues. First, the XOR constraint between the two inputs is tricky when the model stores them as private attributes. You want to accept either one or the other, but never both and never neither. Second, the lazy getters are correct conceptually, yet type checkers point out that the returned values could still be None because the Optional state is not obvious to the type system. That leaves you with properties that work at runtime but keep flagging potential None at static analysis time. On top of that, the costly I/O to resolve the object from its ID should not run unless the object is actually accessed.

Solution that enforces XOR and stays lazy

The idea is straightforward. Use a real field for the identifier and an aliased input for the object. Keep a private cache to hold the resolved object so the fetch remains lazy. Enforce the XOR relation in a model validator. If the object comes in, set the identifier from it so later code can treat the identifier as present without Optional noise.

from __future__ import annotations
from pydantic import BaseModel, Field, PrivateAttr, model_validator
from typing import Optional

class Resource:
    key: int

def resolve_resource(key: int) -> Resource:
    ...

class ResourceCarrier(BaseModel):
    ref_id: Optional[int] = None
    _res_in: Optional[Resource] = Field(default=None, alias="resource")
    _res_cache: Optional[Resource] = PrivateAttr(None)

    @model_validator(mode="after")
    def _xor_and_populate(self):
        if (self.ref_id is None) == (self._res_in is None):
            raise ValueError("provide exactly one of ref_id or resource")
        if self._res_in is not None:
            self.ref_id = self._res_in.key
        return self

    @property
    def resource(self) -> Resource:
        if self._res_cache is None:
            if self._res_in is not None:
                self._res_cache = self._res_in
            else:
                assert self.ref_id is not None
                self._res_cache = resolve_resource(self.ref_id)  # lazy I/O
        return self._res_cache

With this structure, the model accepts exactly one of the two inputs and guarantees a consistent internal state. Passing an identifier like ResourceCarrier(ref_id=5) defers the expensive call until resource is first accessed. Passing a full object like ResourceCarrier(resource=some_res) sets ref_id to some_res.key and skips any fetch entirely.

Why this matters

This pattern solves three practical concerns at once. It enforces the “exactly one input” rule in a single place, which prevents ambiguous state. It keeps the network or database call lazy, so you only pay for I/O when the object is actually needed. And it makes the identifier non-optional after validation when an object was supplied, which helps downstream code avoid Optional churn.

Takeaways

When a model can be identified in two interchangeable ways, embrace a real field for the key data, accept the alternate form via an alias, and centralize the XOR check in a validator. Keep an internal cache for the resolved object and expose a property that performs lazy loading. This keeps validation explicit, defers expensive work, and lets the rest of the code rely on a stable, predictable shape.

The article is based on a question from StackOverflow by Engensmax and an answer by Dmitry543.