2025, Sep 23 21:00

Pandas NaN from NumPy vs Pydantic Literal[np.nan]: Understanding the Validation Failure and Two Fixes

Learn why Pandas returns NumPy NaN that breaks Pydantic Literal[np.nan], and fix it with validators: coerce NumPy scalars or accept NaN without loosening types.

Pandas happily uses NaN to represent missing numeric data, but when that NaN comes from a Series it is a NumPy scalar, not a plain Python float. If you feed those values into a Pydantic model typed with Literal[np.nan], validation can fail in surprising ways. Here’s how to see it, understand why it happens, and fix it cleanly without loosening your schema more than necessary.

Minimal reproduction of the validation failure

Start with the typical Pandas pattern: create an integer Series, reindex to introduce a missing value, and iterate over the underlying values to validate them through a Pydantic model that accepts either a positive integer or NaN.

import pandas as pd
import numpy as np
import pydantic
from typing import Union, Literal

# Integer series, then reindex to introduce a missing value at the tail
ser_missing = pd.Series([1, 2], dtype=np.int64).reindex([0, 1, 2])

# Model: "positive integer" OR literal NaN
class Gauge(pydantic.BaseModel):
    granularity: Union[pydantic.conint(ge=1), Literal[np.nan]]

# Iterate over values; the last item is a NaN from Pandas
for item in ser_missing.values:
    # This raises a ValidationError on the NaN coming from Pandas
    Gauge(granularity=item)

Calling the model directly with np.nan works, yet validating the NaN that came from the Series fails. That means the object Pandas handed us is not the same as the object matched by Literal[np.nan].

What actually goes wrong

Pandas returns a NumPy scalar for the missing value, specifically numpy.float64('nan'). Literal[np.nan] in Pydantic matches the Python NaN (float('nan')), not the NumPy NaN. The two are both NaN in value semantics, but they are not the same object for a Literal match, so the literal check rejects the NumPy scalar.

Two precise ways to fix it

The first strategy is to coerce NumPy floating scalars into Python floats before Literal[np.nan] runs. That converts numpy.float64('nan') into float('nan'), which the literal accepts.

from typing import Union, Literal
from typing_extensions import Annotated
from pydantic import BaseModel, PositiveInt, BeforeValidator
import numpy as np

# Coerce NumPy floating scalars to Python floats
def to_builtin_float(value):
    if isinstance(value, np.floating):
        return float(value)
    return value

OnlyNaN = Annotated[Literal[np.nan], BeforeValidator(to_builtin_float)]

class GaugeFixed(BaseModel):
    granularity: Union[PositiveInt, OnlyNaN]

# Usage with Pandas values
for val in ser_missing.values:
    GaugeFixed(granularity=val)

The second strategy is to define an "any NaN" type that accepts either Python float NaN or NumPy NaN via a validator. This avoids a Literal match entirely and focuses on the semantic property “is NaN”.

from typing import Union
from typing_extensions import Annotated
from pydantic import BaseModel, PositiveInt, AfterValidator
import numpy as np
import math

# Accepts any NaN (Python or NumPy) and normalizes to float('nan')
def require_nan(value):
    if isinstance(value, (float, np.floating)) and math.isnan(float(value)):
        return float('nan')
    raise ValueError('not NaN')

AnyNaN = Annotated[float, AfterValidator(require_nan)]

class GaugeStrict(BaseModel):
    granularity: Union[PositiveInt, AnyNaN]

# Usage with Pandas values
for val in ser_missing.values:
    GaugeStrict(granularity=val)

Both approaches target the root cause: a mismatch between NumPy’s scalar type and what Literal[np.nan] compares against. If coercion still errors in your setup when fed a NumPy NaN, prefer the "any NaN" path since it checks the semantic property directly.

Why this nuance matters

Data pipelines often shuttle values between Pandas, NumPy, and Pydantic. Small type differences like numpy.float64 versus float can silently break tight schemas that rely on Literal or strict unions. If a model is intended to treat NaN as a first-class “missing” token alongside constrained integers, you want a solution that accepts the data emitted by Pandas without opening the door to arbitrary floats.

Takeaways

When validating Pandas outputs, remember that missing numeric values arrive as NumPy scalars. Literal[np.nan] matches the Python NaN and will reject the NumPy counterpart. To keep your model precise, either normalize NumPy scalars to Python floats before the literal match or switch to an "any NaN" validator that enforces the semantic property of being NaN while keeping integers strictly constrained. This keeps models predictable, avoids over-permissive types, and aligns neatly with how Pandas represents missing data.

The article is based on a question from StackOverflow by MikeFenton and an answer by Dmitry543.