2025, Dec 24 05:00

Type-safe validation of sorted iterables in Python: pass-through generator with key function, Protocol, and overloads

Learn how to type a Python generator that validates sorted streams, using a Comparable Protocol, overloads for optional key functions, and precise type hints.

Validating that an incoming stream is sorted while passing values through is a neat pattern, but it comes with a typing twist: the constraints differ depending on whether a key function is provided. Below is a practical way to express those constraints in type hints without sacrificing the generator’s behavior.

Problem

You want a generator that yields items unchanged, checks sortedness on the fly, and raises a ValueError if the order is broken. The runtime logic is straightforward; the challenge is to make static typing reflect that items themselves must be comparable if no key is given, and that the result of key(item) must be comparable if a key is present.

def ensure_sorted(stream, key=None):
    first_seen = True
    for elem in stream:
        if first_seen:
            last_key = key(elem) if key else elem
            first_seen = False
        else:
            cur_key = key(elem) if key else elem
            if last_key > cur_key:
                raise ValueError("not sorted")
            last_key = cur_key
        yield elem

# Simple smoke checks
def run_demo():
    list(ensure_sorted([1, 2, 2, 3]))
    list(ensure_sorted([]))
    import pytest
    with pytest.raises(ValueError):
        list(ensure_sorted([2, 1]))

What’s really going on

There are two modes and they have different typing rules. If key is omitted, the iterable must yield comparable objects. If key is provided, the iterable can yield any type, but key(item) must be comparable. That split can’t be expressed with a single signature. It needs distinct overloads keyed by the presence of key.

To capture “comparable” in type hints, define a Protocol that requires __lt__. This avoids relying on specific concrete types and lets static checkers validate both branches. Inside the implementation, a small cast is necessary to convince the type checker that the comparison operands are indeed comparable in both modes.

Solution

The following expresses the constraints with a Comparable-like protocol, two overloads, and a single implementation that uses cast exactly at the comparison boundary.

from abc import abstractmethod
from typing import Callable, TypeVar, Iterable, Generator, Protocol, cast, overload

class Orderable(Protocol):
    """Protocol for types that support ordering via <."""

    @abstractmethod
    def __lt__(self: "OC", other: "OC", /) -> bool:
        ...

U = TypeVar("U")
OC = TypeVar("OC", bound=Orderable)

# No key: items themselves must be comparable; the generator yields the same type back.
@overload
def ensure_sorted(stream: Iterable[OC], key: None = None) -> Generator[OC, None, None]: ...

# With key: any input type is allowed; key(item) must be comparable; the generator still yields input items.
@overload
def ensure_sorted(stream: Iterable[U], key: Callable[[U], OC]) -> Generator[U, None, None]: ...


def ensure_sorted(stream: Iterable[U], key: Callable[[U], OC] | None = None) -> Generator[U, None, None]:
    first_seen = True
    for elem in stream:
        if first_seen:
            last_key = cast(OC, key(elem) if key else elem)
            first_seen = False
        else:
            cur_key = cast(OC, key(elem) if key else elem)
            if last_key > cur_key:
                raise ValueError("not sorted")
            last_key = cur_key
        yield elem

This arrangement relies on overloads to describe the two modes, a Protocol to define what “comparable” means, and a localized cast so static checkers treat both branches as producing the same comparable key type.

Usage examples that static checkers can validate

from dataclasses import dataclass
import functools

@dataclass
class Alpha:
    ...

@functools.total_ordering
@dataclass
class Beta:
    def __lt__(self, other, /) -> bool:
        ...

# No key: items must be comparable.
ensure_sorted([Alpha()])   # type error: Alpha is not orderable
ensure_sorted([Beta()])    # type ok

# With key: the key must accept the item type and return a comparable type.
def pick_int(n: int) -> int:
    ...
ensure_sorted([Beta()], key=pick_int)  # type error: key does not accept Beta

def pick_beta(b: Beta) -> int:
    ...
ensure_sorted([Beta()], key=pick_beta)  # type ok

def pick_alpha(a: Alpha) -> int:
    ...
ensure_sorted([Alpha()], key=pick_alpha)  # type ok

def pick_any(a: Alpha) -> object:
    ...
ensure_sorted([Alpha()], key=pick_any)  # type error: object is not known to be comparable

Why this matters

With the overloads in place, static analyzers can catch the real integration mistakes: passing non-orderable items without a key, providing a key with the wrong parameter type, and returning non-comparable values from the key. The runtime behavior of raising a ValueError on unsorted input stays intact, even though that exception cannot be expressed in annotations. Likewise, there is no way to distinguish between sorted and unsorted iterables at the type level, because the notion of “sorted” is data-dependent and, with a key, context-dependent.

Takeaways

Model comparability with a Protocol centered on __lt__, split behavior with overloads depending on the presence of key, and use a narrow cast at the point where both branches join. Ensure that critical pieces such as the generator function and any ordering methods like __lt__ are fully annotated. This combination keeps the pass-through generator clean while providing precise, practical feedback from type checkers.