2025, Dec 12 09:00

Preserving _metadata in pandas DataFrame subclasses with pyjanitor: avoid AttributeError in method chains

Learn why pyjanitor drops pandas DataFrame subclass _metadata and how to fix it with a carry_meta helper to preserve attributes and avoid pipeline errors.

Custom DataFrame subclasses in pandas are a powerful way to attach domain-specific state to your data via _metadata. But once you bring pyjanitor into the chain, that state can silently vanish. The result is a confusing AttributeError in the middle of an otherwise clean method pipeline.

Minimal example that reproduces the issue

The following snippet defines a DataFrame subclass with a single custom attribute, pushes it through a couple of DataFrame operations, and then through a pyjanitor manipulation. The attribute survives the pandas-native step, but disappears after the janitor call.

import pandas as pd
import janitor  # noqa: F401
import pandas_flavor as pf
# See: https://pandas.pydata.org/pandas-docs/stable/development/extending.html#define-original-properties
class CustomFrame(pd.DataFrame):
    _metadata = ["flag"]
    @property
    def _constructor(self):
        return CustomFrame
@pf.register_dataframe_method
def setflag(self):
    new_obj = CustomFrame(self)
    new_obj.flag = 2
    return new_obj
@pf.register_dataframe_method
def showflag(self):
    print(self.flag)
    return self
frame = pd.DataFrame(
    {
        "Year": [1999, 2000, 2004, 1999, 2004],
        "Taxon": [
            "Saccharina",
            "Saccharina",
            "Saccharina",
            "Agarum",
            "Agarum",
        ],
        "Abundance": [4, 5, 2, 1, 8],
    }
)

This works and prints 2, because the subclass and its metadata survive:

ok = frame.setflag().query("Taxon=='Saccharina'").showflag()

But this raises AttributeError: 'DataFrame' object has no attribute 'flag', because the janitor call returns a plain DataFrame without your subclass and its metadata:

idx = pd.Index(range(1999, 2005), name="Year")
bad = frame.setflag().complete(idx, "Taxon", sort=True).showflag()

What’s going on

pyjanitor functions commonly create new data structures from scratch using pandas.DataFrame as the template, or by combining data via operations such as pandas.merge. In the first case, you end up with a fresh DataFrame that carries only the default _metadata. In the second, the returned object may not retain your subclass at all, and there’s no guarantee its metadata will be preserved. This behavior stems from how the methods are implemented and is not a bug in pandas itself. There are related discussions in the pandas-dev repository around _metadata, but the short version here is simple: the subclass and its attributes aren’t automatically propagated by these janitor functions.

Practical fix with a custom piping helper

With pandas 2.2.3 and janitor 0.31.0, a reliable way to keep metadata intact is to wrap functions that may return a plain DataFrame and explicitly restore your subclass and its _metadata. The helper below does exactly that and plugs seamlessly into method chains.

@pf.register_dataframe_method
def carry_meta(df_obj: pd.DataFrame, fn: callable, *args, **kwargs):
    result = fn(df_obj, *args, **kwargs)
    if isinstance(result, pd.DataFrame):
        result = df_obj.__class__(result)
        for name in df_obj._metadata:
            setattr(result, name, getattr(df_obj, name, None))
    return result

Use it to run the original task while preserving your custom attribute:

fixed = (
    frame
    .setflag()
    .carry_meta(janitor.complete, idx, "Taxon", sort=True)
    .showflag()
)

Why this matters

Method-chaining is a core part of the pandas and pyjanitor workflows. If your pipelines depend on _metadata—for example, to carry configuration or context between steps—then losing it mid-chain breaks both correctness and debuggability. A small helper that re-wraps results into your subclass and restores attributes listed in _metadata keeps your transformations predictable end to end.

Takeaways

If you subclass pandas.DataFrame and rely on _metadata, be aware that pyjanitor methods may return plain DataFrames and drop your custom attributes as a result. When you know a function may create a brand-new object or combine data, route the call through a wrapper like carry_meta to re-instantiate your subclass and copy the attributes you care about. With that in place, your pipelines remain concise, your metadata remains intact, and your final steps—like printing or consuming that attribute—work as expected.