https://pytroubles.com/en/posts/id47-python-dataclasses-vs-dependent-defaults-keep-configs-editable-using-post-init-or-constructors

Python dataclasses vs dependent defaults: keep configs editable using post_init or constructors

Dataclasses vs dependent defaults in Python: practical patterns to keep editable configs without boilerplate

Python dataclasses vs dependent defaults: keep configs editable using post_init or constructors

Why default_factory can't depend on other fields, and how post_init, plain classes, or classmethod builders keep Python configs editable, clear, and lean.

2025-09-22T15:00:05+03:00

Dataclasses vs. dependent defaults in Python: how to keep configs editable without drowning in boilerplateDesigning a Python config layer that mid-level users can safely adapt is a balancing act. In scenarios where Python objects represent DSL models, a natural wish is to keep all field definitions in one place and let some defaults depend on other fields. That looks neat on paper, but quickly collides with how dataclasses evaluate defaults. Let’s unpack the friction and show practical ways to keep the code approachable for non-expert editors.The minimal example that looks right but failsSuppose we want a class where one field’s default is a function of another field. The first attempt typically looks like this:from enum import Enum import typing as tp import dataclasses as dc import random class Phase(Enum): ENTER = "incoming" EXIT = "outgoing" NAP = "sleeping" def picker_from_enum[U: Enum](K: type[U]) -> tp.Callable[[], U]: """ Returns a callable that yields a random member of enum K. """ return lambda: random.choice(list(K)) mean_lag = { Phase.ENTER: 10, Phase.EXIT: 20, Phase.NAP: 100 } def delay_sampler_from_phase(p: Phase) -> tp.Callable[[], int]: """ Returns a callable that yields a random delay centered around mean_lag[p]. """ return lambda: int(random.gauss(mean_lag[p])) @dc.dataclass class RandomQuery: mode: Phase = dc.field(default_factory=picker_from_enum(Phase)) lag: int = dc.field(default_factory=delay_sampler_from_phase(mode)) if __name__ == "__main__": print(RandomQuery()) This does not work because the expression passed to default_factory for lag tries to use mode at class definition time. There is no instance yet, so there is no mode to read. Similar attempts with self.mode or cls.mode won’t help, and swapping factory for a direct default won’t change the timing problem.What’s really going onDataclasses shine when the caller passes all public fields directly. In that standard flow, a call like RandomQuery(mode=Phase.ENTER, lag=5) won’t touch any default_factory at all. More importantly here, dataclass field declarations are evaluated as the class is created, so anything in default_factory=... must be a ready-to-use zero-argument callable, not a computation that references another field on the future instance.Because of this, if a field depends on another field, the idiomatic way is to compute it after the instance is created. Dataclasses provide __post_init__ for that.A workable pattern with dataclassesYou can keep the dataclass and compute the dependent values in one place right after construction:import dataclasses as dc @dc.dataclass class RandomQuery: mode: Phase = dc.field(default_factory=picker_from_enum(Phase)) lag: int = dc.field(init=False) def __post_init__(self): self.lag = delay_sampler_from_phase(self.mode)() This keeps runtime behavior correct and makes the dependency explicit. However, if you have many dependent fields, the post-init block grows fast. That was exactly the pain: with several groups of related properties, the class becomes hard to scan and error-prone for editors who just want to tweak inputs.When a plain class is simplerIf the common usage is to create objects with no parameters and let the class itself populate values, a regular class has less ceremony and keeps all the initialization logic in one obvious place:# not a dataclass class RandomQuery: def __init__(self): self.mode = random.choice(list(Phase.__members__.values())) self.lag = int(random.gauss(mean_lag[self.mode])) This design is straightforward for mid-level users: all the moving parts live in __init__, and it avoids the split between field declarations and a long post-init.Dataclass plus a secondary constructorIf the primary workflow passes all fields explicitly, but you also want a convenient “randomized” way to build instances, a classmethod as a secondary constructor works cleanly:from dataclasses import dataclass from typing import Self @dataclass class RandomQuery: mode: Phase lag: int @classmethod def random(cls) -> Self: mode = random.choice(list(Phase.__members__.values())) lag = int(random.gauss(mean_lag[mode])) return cls(mode=mode, lag=lag) This keeps the dataclass semantics intact for the “normal path” while still offering a one-liner to spin up randomized objects. As a small tip, instead of list(Phase.__members__.values()) you can also use list(Phase).There is another direction that sometimes comes up: if you ever need to generate random values generically across multiple dataclasses, dataclasses.fields() lets you introspect declared fields. That can be useful for a bespoke fuzzer or generator, although it’s often overkill unless you are actually building tooling around that.Why this matters for an editable DSL config layerThe whole point of the setup is to let mid-level users extend or adjust the configuration surface without touching the core generator. If the class that they edit is concise and predictable, they can add scenarios with confidence. If it sprawls across many init=False declarations and a long post-init with repeated lines, small mistakes slip in easily. Choosing between a plain class initializer, a dataclass with post-init, or a dataclass with a dedicated random constructor is less about style and more about matching how the class will be used most of the time.Practical advice and takeawaysIf you want dependent defaults inside the same dataclass declaration, Python won’t let you reference one field from another’s default_factory. The closest idiomatic alternative is to assign dependent values in __post_init__. If the common call path supplies no arguments and everything is randomized or derived, a regular class puts all initialization logic in one place and is easier to read and maintain. If the common call path supplies explicit values but you still want a convenient randomized builder, make a classmethod constructor that returns an instance with computed values.In setups with many similarly shaped fields that grow over time, consider whether the flat, numbered shape is doing you any favors. Grouping related values and pushing generation into helpers often reduces repetition and makes the intent clearer to the next person editing the file.The short version: pick the construction pattern that matches how your objects are normally created, keep dependent computations in one well-defined spot, and favor structures that mid-level users can scan and modify without chasing logic across the class.

Python dataclasses, dependent defaults, default_factory, __post_init__, editable configs, DSL models, config layer, classmethod constructor, plain class initializer, boilerplate

2025

2025, Sep 22 15:00

Dataclasses vs dependent defaults in Python: practical patterns to keep editable configs without boilerplate

Why default_factory can't depend on other fields, and how post_init, plain classes, or classmethod builders keep Python configs editable, clear, and lean.

Dataclasses vs. dependent defaults in Python: how to keep configs editable without drowning in boilerplate

Designing a Python config layer that mid-level users can safely adapt is a balancing act. In scenarios where Python objects represent DSL models, a natural wish is to keep all field definitions in one place and let some defaults depend on other fields. That looks neat on paper, but quickly collides with how dataclasses evaluate defaults. Let’s unpack the friction and show practical ways to keep the code approachable for non-expert editors.

The minimal example that looks right but fails

Suppose we want a class where one field’s default is a function of another field. The first attempt typically looks like this:

from enum import Enum
import typing as tp
import dataclasses as dc
import random
class Phase(Enum):
  ENTER = "incoming"
  EXIT = "outgoing"
  NAP = "sleeping"
def picker_from_enum[U: Enum](K: type[U]) -> tp.Callable[[], U]:
  """
  Returns a callable that yields a random member of enum K.
  """
  return lambda: random.choice(list(K))
mean_lag = {
  Phase.ENTER: 10,
  Phase.EXIT: 20,
  Phase.NAP: 100
}
def delay_sampler_from_phase(p: Phase) -> tp.Callable[[], int]:
  """
  Returns a callable that yields a random delay centered around mean_lag[p].
  """
  return lambda: int(random.gauss(mean_lag[p]))
@dc.dataclass
class RandomQuery:
  mode: Phase = dc.field(default_factory=picker_from_enum(Phase))
  lag: int = dc.field(default_factory=delay_sampler_from_phase(mode))
if __name__ == "__main__":
  print(RandomQuery())

This does not work because the expression passed to default_factory for lag tries to use mode at class definition time. There is no instance yet, so there is no mode to read. Similar attempts with self.mode or cls.mode won’t help, and swapping factory for a direct default won’t change the timing problem.

What’s really going on

Dataclasses shine when the caller passes all public fields directly. In that standard flow, a call like RandomQuery(mode=Phase.ENTER, lag=5) won’t touch any default_factory at all. More importantly here, dataclass field declarations are evaluated as the class is created, so anything in default_factory=... must be a ready-to-use zero-argument callable, not a computation that references another field on the future instance.

Because of this, if a field depends on another field, the idiomatic way is to compute it after the instance is created. Dataclasses provide __post_init__ for that.

A workable pattern with dataclasses

You can keep the dataclass and compute the dependent values in one place right after construction:

import dataclasses as dc
@dc.dataclass
class RandomQuery:
  mode: Phase = dc.field(default_factory=picker_from_enum(Phase))
  lag: int = dc.field(init=False)
  def __post_init__(self):
    self.lag = delay_sampler_from_phase(self.mode)()

This keeps runtime behavior correct and makes the dependency explicit. However, if you have many dependent fields, the post-init block grows fast. That was exactly the pain: with several groups of related properties, the class becomes hard to scan and error-prone for editors who just want to tweak inputs.

When a plain class is simpler

If the common usage is to create objects with no parameters and let the class itself populate values, a regular class has less ceremony and keeps all the initialization logic in one obvious place:

# not a dataclass
class RandomQuery:
  def __init__(self):
    self.mode = random.choice(list(Phase.__members__.values()))
    self.lag = int(random.gauss(mean_lag[self.mode]))

This design is straightforward for mid-level users: all the moving parts live in __init__, and it avoids the split between field declarations and a long post-init.

Dataclass plus a secondary constructor

If the primary workflow passes all fields explicitly, but you also want a convenient “randomized” way to build instances, a classmethod as a secondary constructor works cleanly:

from dataclasses import dataclass
from typing import Self
@dataclass
class RandomQuery:
  mode: Phase
  lag: int
  @classmethod
  def random(cls) -> Self:
    mode = random.choice(list(Phase.__members__.values()))
    lag = int(random.gauss(mean_lag[mode]))
    return cls(mode=mode, lag=lag)

This keeps the dataclass semantics intact for the “normal path” while still offering a one-liner to spin up randomized objects. As a small tip, instead of list(Phase.__members__.values()) you can also use list(Phase).

There is another direction that sometimes comes up: if you ever need to generate random values generically across multiple dataclasses, dataclasses.fields() lets you introspect declared fields. That can be useful for a bespoke fuzzer or generator, although it’s often overkill unless you are actually building tooling around that.

Why this matters for an editable DSL config layer

The whole point of the setup is to let mid-level users extend or adjust the configuration surface without touching the core generator. If the class that they edit is concise and predictable, they can add scenarios with confidence. If it sprawls across many init=False declarations and a long post-init with repeated lines, small mistakes slip in easily. Choosing between a plain class initializer, a dataclass with post-init, or a dataclass with a dedicated random constructor is less about style and more about matching how the class will be used most of the time.

Practical advice and takeaways

If you want dependent defaults inside the same dataclass declaration, Python won’t let you reference one field from another’s default_factory. The closest idiomatic alternative is to assign dependent values in __post_init__. If the common call path supplies no arguments and everything is randomized or derived, a regular class puts all initialization logic in one place and is easier to read and maintain. If the common call path supplies explicit values but you still want a convenient randomized builder, make a classmethod constructor that returns an instance with computed values.

In setups with many similarly shaped fields that grow over time, consider whether the flat, numbered shape is doing you any favors. Grouping related values and pushing generation into helpers often reduces repetition and makes the intent clearer to the next person editing the file.

The short version: pick the construction pattern that matches how your objects are normally created, keep dependent computations in one well-defined spot, and favor structures that mid-level users can scan and modify without chasing logic across the class.

The article is based on a question from StackOverflow by globglogabgalab and an answer by David Maze.

enums python python-dataclasses