2025, Oct 19 17:00

Why Altair CDF area charts break after transform_quantile—and how to fix stacking and duplicate x values

Learn why Altair CDF area charts break after transform_quantile: duplicate x values trigger stacking. Fix by disabling y stacking or adjusting step size.

Area charts are a natural choice for visualizing a CDF because the curve should be monotonically increasing and the area under it grounds the eye. Yet when you build a CDF in Altair via transform_quantile, it’s easy to end up with odd polygons and broken fills. The culprit is not the CDF math itself but how the area mark handles repeated x positions after the quantile transform.

Minimal setup that looks wrong

The following snippet computes quantiles from a small set of outcomes and tries to render them as an area chart.

import altair as alt
import polars as pl

df_src = pl.DataFrame({"outcomes": [16950, 17050, 18750, 18750, 20950]})
(
    alt.Chart(df_src)
    .transform_quantile("outcomes", step=0.1)
    .mark_area(line=True, opacity=0.5)
    .encode(
        x=alt.X("value:Q"),
        y=alt.Y("prob:Q").title("Prob"),
    )
)

What actually happens

With few input values relative to the chosen step, the quantile transform produces multiple probabilities that map to the same value. You can confirm this by inspecting the intermediate data in the Vega Editor; for example, probabilities like 0.55, 0.65, and 0.75 can all share the same value. When an area chart encounters multiple rows with the same x coordinate, it attempts to stack them, which creates the unexpected geometry you’re seeing. Increasing the number of distinct outcomes usually removes these collisions, but it’s not guaranteed if your data contains ties.

Fix: turn off stacking (or adjust the transform)

The simplest fix is to disable stacking on the y channel so that repeated x positions don’t accumulate on top of each other. If you can afford it, another option is to increase the step size so fewer probabilities collide on the same value.

import altair as alt
import polars as pl

df_src = pl.DataFrame({"outcomes": [16950, 17050, 18750, 18750, 20950]})
(
    alt.Chart(df_src)
    .transform_quantile("outcomes", step=0.1)
    .mark_area(line=True, opacity=0.5)
    .encode(
        x=alt.X("value:Q"),
        y=alt.Y("prob:Q", stack=None).title("Prob")
    )
)

About the Y2 baseline and the y2/Y2 API

In an area chart, 0 is the default baseline for Y2. However, once multiple rows share the same x position, the renderer has to decide how to combine those y values. Should it sum them, take the maximum, or do something else? That ambiguity is precisely why the polygon looks strange when repeated x values appear. Disabling stacking removes that ambiguity. Another way to resolve it is to aggregate, for example by taking the maximum probability on the y channel so duplicate x positions collapse into a single point.

Regarding API usage, y2 can be set with a datum baseline. The accepted forms are y2=alt.Y2Datum(0) or passing alt.Y2Datum(0) directly in encode. This can make a chart appear to “work,” but it’s generally better to address the root cause by preventing stacking or by aggregating the y values, because the core issue is not the baseline but duplicated x positions.

Why it’s worth knowing

Quantile transforms and area marks are powerful, but they intersect in a way that surfaces the semantics of stacking. When repeated x positions occur, area stacking can distort a CDF that should otherwise be monotonic and visually simple. Knowing how to control stacking, and when to aggregate or change the step size, keeps the chart faithful to the underlying distribution.

Conclusion

If your Altair CDF area chart looks jagged or “folded,” check the intermediate quantile output for repeated values on the x axis. Turn off stacking on the y channel to avoid unintended accumulation, or aggregate the probabilities so duplicates resolve deterministically. If precision allows, consider adjusting the quantile step. With those guardrails in place, an area-mark CDF will render as the increasing curve you expect.

The article is based on a question from StackOverflow by Juan Martinez and an answer by kgoodrick.