2025, Nov 16 11:00

How to fix AttributeError on scikit-learn estimator tags: use the Tags object, not a dict

Learn why custom scikit-learn estimators break with dict-based __sklearn_tags__ and AttributeError, and how to migrate to the new Tags object API with a fix.

Building a custom scikit-learn estimator is straightforward until you hit a subtle API change. A common pitfall in recent scikit-learn versions is implementing estimator tags as a plain Python dictionary. In environments expecting the new Tags object, this results in an AttributeError complaining that a dict has no attribute like requires_fit. Below is a practical walkthrough of the problem and its fix.

Reproducing the issue

The following minimal estimator compiles and looks correct at a glance, but fails when scikit-learn queries its tags:

from sklearn.utils.validation import check_is_fitted, check_X_y
import numpy as np
from sklearn.base import BaseEstimator, RegressorMixin
class PenalizedRegressor(BaseEstimator, RegressorMixin):
    """
    Minimal penalized regression-style estimator.
    """
    def __init__(self, tau: float = 0.5):
        self.tau = tau
    def _calc_params(self, X, y):
        self.intercept_ = 0
        self.coef_ = self.tau * np.ones(X.shape[0])
    def fit(self, X: np.ndarray, y: np.ndarray):
        self.feature_names_in_ = None
        if hasattr(X, "columns") and callable(getattr(X, "columns", None)):
            self.feature_names_in_ = np.asarray(X.columns, dtype=object)
        X, y = check_X_y(X, y, accept_sparse=False, y_numeric=True, ensure_min_samples=2)
        self.n_features_in_ = X.shape[1]
        self._calc_params(X, y)
        self.is_fitted_ = True
        return self
    def predict(self, X: np.ndarray) -> np.ndarray:
        check_is_fitted(self, ["coef_", "intercept_", "is_fitted_"])
        preds = np.dot(X, self.coef_) + self.intercept
        return preds
    def __sklearn_tags__(self):
        tags = {
            "allow_nan": False,
            "requires_y": True,
            "requires_fit": True,
        }
        return tags
# Sample usage
from sklearn.datasets import make_regression
X, y, true_coef = make_regression(n_samples=200, n_features=200, n_informative=25, bias=10, noise=5, random_state=42, coef=True)
reg = PenalizedRegressor()
reg.fit(X, y)

Running this raises: AttributeError: 'dict' object has no attribute 'requires_fit'. The estimator appears to expose requires_fit, but scikit-learn tries to access it as an attribute on a Tags object, not as a dictionary key.

Why this happens

Recent scikit-learn internals no longer expect tags as a plain dict. The library now uses a dedicated Tags class with structured fields such as input_tags, target_tags, and top-level flags like requires_fit. If an estimator returns a dict, consumer code that accesses attributes (for example, tags.requires_fit) will fail because a dict has no such attribute. This manifests exactly as the AttributeError above.

There is compatibility logic in scikit-learn 1.6.1 that can convert old-style tags to the new representation (function _to_new_tags in sklearn/utils/_tags.py), but this conversion is not something you should rely on, especially since it is removed in 1.7.0. The robust approach is to opt into the Tags object explicitly.

The fix

Return a proper Tags instance by delegating to super().__sklearn_tags__(), then set the fields you need. Note that required replaces the old requires_y flag and sits under target_tags. The input-side NaN allowance is under input_tags.allow_nan. Finally, requires_fit remains a top-level attribute.

from sklearn.utils.validation import check_is_fitted, check_X_y
import numpy as np
from sklearn.base import BaseEstimator, RegressorMixin
class PenalizedRegressor(BaseEstimator, RegressorMixin):
    def __init__(self, tau: float = 0.5):
        self.tau = tau
    def _calc_params(self, X, y):
        self.intercept_ = 0
        self.coef_ = self.tau * np.ones(X.shape[0])
    def fit(self, X: np.ndarray, y: np.ndarray):
        self.feature_names_in_ = None
        if hasattr(X, "columns") and callable(getattr(X, "columns", None)):
            self.feature_names_in_ = np.asarray(X.columns, dtype=object)
        X, y = check_X_y(X, y, accept_sparse=False, y_numeric=True, ensure_min_samples=2)
        self.n_features_in_ = X.shape[1]
        self._calc_params(X, y)
        self.is_fitted_ = True
        return self
    def predict(self, X: np.ndarray) -> np.ndarray:
        check_is_fitted(self, ["coef_", "intercept_", "is_fitted_"])
        preds = np.dot(X, self.coef_) + self.intercept
        return preds
    def __sklearn_tags__(self):
        tags = super().__sklearn_tags__()
        tags.input_tags.allow_nan = False
        tags.target_tags.required = True  # replaces `requires_y`
        tags.requires_fit = True
        return tags
# Sample usage
from sklearn.datasets import make_regression
X, y, true_coef = make_regression(n_samples=200, n_features=200, n_informative=25, bias=10, noise=5, random_state=42, coef=True)
reg = PenalizedRegressor()
reg.fit(X, y)

This aligns the estimator with the current tagging model. In environments where calling model._get_tags() previously failed, constructing tags via super().__sklearn_tags__() resolves the mismatch.

Why this matters

Tags drive key behaviors across the scikit-learn ecosystem: input validation, y-requirements, NaN handling, and whether an estimator must be fitted before use. Returning a dict worked in older releases but clashes with the modern attribute-based API. Given that transitional helpers exist only in specific versions and are removed later, relying on them is fragile. Implementing the Tags interface directly ensures your estimator integrates cleanly across toolchains and versions that expect the new structure.

Takeaways

If you see AttributeError related to tags on a custom estimator, the root cause is typically a dict-based tag implementation. Delegate to super().__sklearn_tags__(), update the structured fields you need—input_tags.allow_nan, target_tags.required, and requires_fit—and return the Tags object. This small change keeps your estimator compatible with scikit-learn’s current validation and inspection machinery and avoids breakage tied to deprecated tag formats.