2025, Nov 17 23:00
How to access R-squared and other model diagnostics in Python pyfixest: tidy() vs summary() and Feols attributes
Learn how to extract R-squared and global fit statistics from pyfixest in Python: use Feols attributes, distinguish tidy() vs summary(), and avoid parsing.
Extracting model diagnostics in Python often looks straightforward until you try doing it with a library that optimizes for statistical fidelity over ergonomic summaries. A common example is pyfixest: the regression summary shows all the right numbers, but not all of them are directly accessible for downstream processing. Users get neat coefficient tables via tidy(), yet essentials like R-squared and the number of observations aren’t obvious to retrieve programmatically from the fitted object.
Minimal example that highlights the issue
The workflow below runs an OLS with clustered inference and prints the standard summary alongside a tidy coefficient table. The diagnostics seen in the printed output aren’t all exposed through tidy().
import pyfixest as fx
fit = fx.feols(model_expr, data_slice, vcov={"CRV1": "gvkey + dyear"})
print(fit.summary())
coef_table = fit.tidy()
The printed output includes blocks such as Estimation: OLS, the total Observations, and a footer with RMSE and R2, while tidy() focuses on slopes with standard errors and tests.
What’s actually going on
In this setup, tidy() is intentionally scoped to coefficient-level results. That makes it convenient for combining or plotting estimates, but it doesn’t incorporate global fit statistics. Meanwhile, summary() renders a human-readable report that includes global metrics, yet that text is not designed for structured extraction.
The crucial detail is that some diagnostics are exposed on the fitted object itself rather than through tidy(). Specifically, the R2 values are available via an attribute documented on the model class.
Solution
R2 is accessible directly from the fitted object. Use the attribute described by the library maintainers and refer to the object’s attribute reference for other available fields.
Reference: attributes for the Feols object are listed here: Feols attributes.
import pyfixest as fx
fit = fx.feols(model_expr, data_slice, vcov={"CRV1": "gvkey + dyear"})
# Human-readable output
print(fit.summary())
# Structured coefficients
coef_table = fit.tidy()
# Programmatic access to R-squared
r2_value = fit._r2
print(r2_value)
If you need to understand how the printed summary obtains and formats elements such as Observations or RMSE, inspect the implementation used by summary() and tidy(). The relevant sources are public: summary() formatting is implemented here: summarize.py, and tidy() is defined here: feols_.py. The attribute list for the model object is the canonical place to see what’s already exposed for direct access.
Why this matters
Efficient model diagnostics are not just cosmetics; they enable reliable pipelines. When metrics like R2 can be retrieved programmatically, they can be logged, validated, or passed into monitoring dashboards without brittle parsing. Keeping the distinction between formatted summaries, coefficient tables, and model attributes clear helps avoid dead ends when you integrate econometric models into data workflows.
Takeaways
Use tidy() for coefficient-level outputs. For global fit statistics, access the model’s attributes directly, including the R2 via fit._r2 as shown above. When in doubt about where a value is sourced, check the summary() and tidy() implementations and the official attribute reference. Incorporating a quick snippet that retrieves diagnostics into your Getting Started scripts will save time and reduce ambiguity in production code.