2025, Nov 06 05:00

How to Type a Python Algorithm Factory: Variance Pitfalls, Literal Overloads, and Read-Only Iterables

Learn how to type a Python factory that returns lists: avoid list invariance, use Protocol and TypeVar, Literal overloads, or return a read-only iterable.

When you build a generic framework that wires data models and algorithms together, one of the first frictions you hit is typing the factory method that selects the right set of algorithm instances based on a config value. Everything shares a common interface, but the concrete data and algorithms differ per problem. The intent is simple: return a collection of algorithms that all agree on the same data shape. The typing rules around variance make that deceptively tricky.

Example setup that triggers the typing question

The following sketch shows a shared protocol for data and algorithms, two concrete payloads, and three concrete algorithms. The factory branches on a runtime key and returns a list of algorithms tuned for a specific payload. The only open question is how to type the factory’s return value.

from dataclasses import dataclass
from typing import Protocol, TypeVar
# Protocols
class CoreData(Protocol):
    common: int
D = TypeVar("D", bound=CoreData)
class AlgoBase(Protocol[D]):
    def update(self, data: D) -> None: ...
# Data implementations
@dataclass
class PayloadOne:
    common: int
    extra: int
@dataclass
class PayloadTwo:
    common: int
    extra: str
# Algorithm implementations
class ProcOne:
    def update(self, data: PayloadOne) -> None:
        data.extra += data.common
class ProcTwoA:
    def update(self, data: PayloadTwo) -> None:
        data.extra *= data.common
class ProcTwoB:
    def update(self, data: PayloadTwo) -> None:
        data.extra += "2b"
# Factory returning different lists depending on a runtime key
class AlgoFactory:
    def _build_one(self) -> list[AlgoBase[PayloadOne]]:
        return [ProcOne()]
    def _build_two(self) -> list[AlgoBase[PayloadTwo]]:
        return [ProcTwoA(), ProcTwoB()]
    def make(self, kind: int):  # What should the return annotation be?
        match kind:
            case 1:
                return self._build_one()
            case 2:
                return self._build_two()
            case _:
                raise ValueError(f"Unknown type of data {kind}")

Why list[AlgoBase[CoreData]] feels right but fails

The goal is “a collection of algorithms that operate on one consistent data type.” It is tempting to declare a single umbrella type like list[AlgoBase[CoreData]]. This is rejected for good reason. The type parameter of list is invariant, so list[X] is neither a subtype nor a supertype of list[Y] even if X and Y are related. Meanwhile the algorithm protocol accepts its type parameter in the update method, which means it must be contravariant in that parameter. Even after making the algorithm protocol contravariant, the outer list keeps the whole expression invariant. The result is that list[AlgoBase[PayloadOne]] is not a subtype of list[AlgoBase[CoreData]], and static checking correctly refuses to unify them under that common supertype.

There is another gotcha worth making explicit. A union inside the list is not the same as a union of lists. Put differently, list[A | B] is not equivalent to list[A] | list[B]. This matters for the factory signature: you cannot collapse different concrete lists into one list over a union and expect it to type-check the same way.

Two practical typings for the factory

The first option is to use overloads keyed by Literal values. This is a common and precise way to encode a factory’s branching behavior. Call sites that pass a known literal get an exact return type, and the implementation returns a union that captures the cases.

from typing import overload, Literal, TypeVar, Protocol
class CoreData(Protocol):
    common: int
D = TypeVar("D", bound=CoreData, contravariant=True)
class AlgoBase(Protocol[D]):
    def update(self, data: D) -> None: ...
@dataclass
class PayloadOne:
    common: int
    extra: int
@dataclass
class PayloadTwo:
    common: int
    extra: str
class ProcOne:
    def update(self, data: PayloadOne) -> None:
        data.extra += data.common
class ProcTwoA:
    def update(self, data: PayloadTwo) -> None:
        data.extra *= data.common
class ProcTwoB:
    def update(self, data: PayloadTwo) -> None:
        data.extra += "2b"
GroupOne = list[AlgoBase[PayloadOne]]
GroupTwo = list[AlgoBase[PayloadTwo]]
class AlgoFactory:
    def _build_one(self) -> GroupOne:
        return [ProcOne()]
    def _build_two(self) -> GroupTwo:
        return [ProcTwoA(), ProcTwoB()]
    @overload
    def make(self, kind: Literal[1]) -> GroupOne: ...
    @overload
    def make(self, kind: Literal[2]) -> GroupTwo: ...
    def make(self, kind: int) -> GroupOne | GroupTwo:
        if kind == 1:
            return self._build_one()
        if kind == 2:
            return self._build_two()
        raise ValueError(f"Unknown type of data {kind}")

The second option is to return an immutable, read-only view via a Protocol that exposes only iteration. This approach avoids list’s invariance by switching to a covariant container such as tuple and specifying that consumers only need to iterate and call update. The protocol only declares __iter__, and you can return a tuple built from the list produced by each branch.

from dataclasses import dataclass
from typing import Protocol, TypeVar, Iterator
class CoreData(Protocol):
    common: int
D = TypeVar("D", bound=CoreData, contravariant=True)
class AlgoBase(Protocol[D]):
    def update(self, data: D) -> None: ...
@dataclass
class PayloadOne:
    common: int
    extra: int
@dataclass
class PayloadTwo:
    common: int
    extra: str
class ProcOne:
    def update(self, data: PayloadOne) -> None:
        data.extra += data.common
class ProcTwoA:
    def update(self, data: PayloadTwo) -> None:
        data.extra *= data.common
class ProcTwoB:
    def update(self, data: PayloadTwo) -> None:
        data.extra += "2b"
class AlgoFactory:
    def _build_one(self) -> list[AlgoBase[PayloadOne]]:
        return [ProcOne()]
    def _build_two(self) -> list[AlgoBase[PayloadTwo]]:
        return [ProcTwoA(), ProcTwoB()]
class Batch(Protocol):
    def __iter__(self) -> Iterator[AlgoBase[CoreData]]: ...
class AlgoFactory:
    def _build_one(self) -> list[AlgoBase[PayloadOne]]:
        return [ProcOne()]
    def _build_two(self) -> list[AlgoBase[PayloadTwo]]:
        return [ProcTwoA(), ProcTwoB()]
    def make(self, kind: int) -> Batch:
        match kind:
            case 1:
                return tuple(self._build_one())
            case 2:
                return tuple(self._build_two())
            case _:
                raise ValueError(f"Unknown type of data {kind}")

The core idea is that clients do not need mutation on the returned collection. By exposing only iteration and returning an immutable sequence, you sidestep the list invariance and keep the API focused on what the callers actually do: iterate and invoke update. As for the protocol, it only declares __iter__; nothing else needs to be implemented in the protocol body.

Why this matters for real projects

When you scale a framework like this, the number of payloads and algorithms grows quickly. A return type like list[AlgoBase[CoreData]] looks maintainable at first, yet it subtly undermines static guarantees you want from a type checker. Being precise with overloads gives call sites exact types when the key is known, and adopting a read-only protocol communicates intent while preventing accidental list mutations. Another important aspect is recognizing that list invariance and union distribution rules are easy to misapply. Being explicit about them saves time as the codebase evolves.

Takeaways

If the calling code knows the factory key at compile time, prefer overloads keyed by Literal values and return a union of concrete list types from the implementation. This keeps the typing accurate without forcing a single umbrella type that will not check. If callers only need to iterate, move to an immutable, covariant view and publish a protocol that exposes just iteration; returning a tuple of algorithms fits naturally with that contract. In both variants the algorithms operate on a single, consistent data type per branch, which was the original design goal.

The article is based on a question from StackOverflow by Durtal and an answer by Dmitry543.