2025, Oct 20 00:00
How to Align Nodes in a Two-Column Plotly Sankey: Calculate Y by Cumulative Flow Totals (with Python)
Learn how to align nodes in a Plotly Sankey diagram with a fixed layout by computing Y positions from cumulative flow totals. Includes Python code and data.
Getting a Sankey diagram to look “right” is less about colors and labels and more about geometry. If nodes along a column don’t line up with the total flow they represent, the whole chart feels off. The catch is that the vertical position of a node is defined by its center, not its top, so hand-picking y coordinates almost always leads to misalignment. Below is a compact walkthrough of how to align nodes properly in a two-column Sankey by computing Y based on the aggregated flow.
Problem setup and a minimal reproducible snippet
The diagram uses fixed positions for nodes on the left and right. Categories are ordered and duplicated across both sides, links are colored by their source, and x/y are manually specified. That manual y placement is where the visual misalignment comes from.
import pandas as pd
import plotly.graph_objects as go
from io import StringIO
# Load data
csv_buf = StringIO("""
from_cat,to_cat,percent
rpf,bp,3.55314197051978
rpf,cc,6.19084561675718
rpf,es,1.21024049650892
rpf,ic,2.46702870442203
rpf,rpf,2.26532195500388
rpf,sc,6.54771140418929
bp,bp,0.977501939487975
bp,cc,0.403413498836307
bp,es,0.108611326609775
bp,ic,4.7944142746315
bp,rpf,0.387897595034911
bp,sc,1.81536074476338
ic,bp,0.124127230411171
ic,cc,0.21722265321955
ic,es,0.0155159038013964
ic,ic,0.170674941815361
ic,rpf,0.0155159038013964
ic,sc,0.294802172226532
cc,bp,1.25678820791311
cc,cc,7.50969743987587
cc,es,9.41815360744763
cc,ic,0.775795190069822
cc,rpf,1.05508145849496
cc,sc,20.8068269976726
cc,sr,0.0465477114041893
sc,bp,0.0155159038013964
sc,cc,0.325833979829325
sc,es,1.92397207137316
sc,rpf,0.0155159038013964
sc,sc,4.43754848719938
sr,bp,0.0620636152055857
sr,cc,1.55159038013964
sr,es,5.10473235065943
sr,ic,0.0155159038013964
sr,rpf,0.0155159038013964
sr,sc,9.71295577967417
sr,sr,0.0775795190069822
es,bp,0.108611326609775
es,cc,0.574088440651668
es,es,1.48952676493406
es,ic,0.0310318076027929
es,rpf,0.0620636152055857
es,sc,2.00155159038014
es,sr,0.0465477114041893
""")
frame = pd.read_csv(csv_buf, skipinitialspace=True)
# Category order
tier_order = ["es", "sr", "sc", "cc", "ic", "bp", "rpf"]
frame["from_cat"] = pd.Categorical(frame["from_cat"], categories=tier_order, ordered=True)
frame["to_cat"] = pd.Categorical(frame["to_cat"], categories=tier_order, ordered=True)
# Deterministic ordering
frame = frame.sort_values(["from_cat", "to_cat"]).reset_index(drop=True)
# Left/right labels and indices
left_tiers = tier_order
right_tiers = tier_order
node_labels = [f"{c} (L)" for c in left_tiers] + [f"{c} (R)" for c in right_tiers]
label_to_id = {lbl: i for i, lbl in enumerate(node_labels)}
frame["src_id"] = frame["from_cat"].map(lambda c: label_to_id.get(f"{c} (L)", -1))
frame["dst_id"] = frame["to_cat"].map(lambda c: label_to_id.get(f"{c} (R)", -1))
frame["src_id"] = pd.to_numeric(frame["src_id"], downcast="integer", errors="coerce").fillna(-1).astype(int)
frame["dst_id"] = pd.to_numeric(frame["dst_id"], downcast="integer", errors="coerce").fillna(-1).astype(int)
# Colors
COLOR_BY_GROUP = {
    "es": "#F6C57A",
    "sr": "#A6D8F0",
    "sc": "#7BDCB5",
    "cc": "#FFC20A",
    "ic": "#88BDE6",
    "bp": "#F4A582",
    "rpf": "#DDA0DD",
    "Unknown": "#D3D3D3"
}
node_fill = [COLOR_BY_GROUP[c] for c in left_tiers] + [COLOR_BY_GROUP[c] for c in right_tiers]
link_fill = [node_fill[s] for s in frame["src_id"].tolist()]
# Manual positions (problematic y placement)
x_pos = [
    0.001, 0.001, 0.001, 0.001, 0.001, 0.001, 0.001,
    0.999, 0.999, 0.999, 0.999, 0.999, 0.999, 0.999
]
y_pos = [
    0.05, 0.18, 0.31, 0.44, 0.57, 0.70, 0.83,
    0.05, 0.18, 0.31, 0.44, 0.57, 0.70, 0.83
]
chart = go.Figure(go.Sankey(
    arrangement="fixed",
    node=dict(
        pad=40,
        thickness=25,
        line=dict(color="black", width=0.5),
        label=node_labels,
        x=x_pos,
        y=y_pos,
        color=node_fill
    ),
    link=dict(
        source=frame["src_id"].tolist(),
        target=frame["dst_id"].tolist(),
        value=frame["percent"].tolist(),
        color=link_fill,
        hovertemplate="%{source.label} → %{target.label}<br><b>%{value:.2f}%</b><extra></extra>"
    ),
    valueformat=".2f",
    valuesuffix="%"
))
chart.update_layout(
    title="Flow",
    font_size=12,
    paper_bgcolor="#f7f7f7",
    plot_bgcolor="#f7f7f7",
    margin=dict(l=30, r=30, t=60, b=30),
    width=1000,
    height=800
)
chart.show()
What goes wrong and why
In a Sankey with arrangement set to fixed, Y is the vertical coordinate of the node center. Evenly spacing nodes ignores their actual total widths, so the center of a fat node ends up offset relative to the flow it represents. That’s why links look slightly skewed or crowded. To line things up, Y should be derived from the cumulative distribution of totals in each column: sum all previous node widths and add half of the current node width to get the center. Finally, normalize those values to the [0, 1] canvas range.
The data is categorized and explicitly ordered, and a deterministic sort keeps the visual order of from_cat and to_cat stable across runs. That stability is important when you tie computed positions back to labels and colors.
Fix and the aligned version
The approach is straightforward: aggregate flows per category for the left column and for the right column, compute cumulative centers, and use those as Y. The X positions stay at the edges. An auxiliary table simplifies the math.
import pandas as pd
import plotly.graph_objects as go
from io import StringIO
# Load data
csv_buf = StringIO("""
from_cat,to_cat,percent
rpf,bp,3.55314197051978
rpf,cc,6.19084561675718
rpf,es,1.21024049650892
rpf,ic,2.46702870442203
rpf,rpf,2.26532195500388
rpf,sc,6.54771140418929
bp,bp,0.977501939487975
bp,cc,0.403413498836307
bp,es,0.108611326609775
bp,ic,4.7944142746315
bp,rpf,0.387897595034911
bp,sc,1.81536074476338
ic,bp,0.124127230411171
ic,cc,0.21722265321955
ic,es,0.0155159038013964
ic,ic,0.170674941815361
ic,rpf,0.0155159038013964
ic,sc,0.294802172226532
cc,bp,1.25678820791311
cc,cc,7.50969743987587
cc,es,9.41815360744763
cc,ic,0.775795190069822
cc,rpf,1.05508145849496
cc,sc,20.8068269976726
cc,sr,0.0465477114041893
sc,bp,0.0155159038013964
sc,cc,0.325833979829325
sc,es,1.92397207137316
sc,rpf,0.0155159038013964
sc,sc,4.43754848719938
sr,bp,0.0620636152055857
sr,cc,1.55159038013964
sr,es,5.10473235065943
sr,ic,0.0155159038013964
sr,rpf,0.0155159038013964
sr,sc,9.71295577967417
sr,sr,0.0775795190069822
es,bp,0.108611326609775
es,cc,0.574088440651668
es,es,1.48952676493406
es,ic,0.0310318076027929
es,rpf,0.0620636152055857
es,sc,2.00155159038014
es,sr,0.0465477114041893
""")
frame = pd.read_csv(csv_buf, skipinitialspace=True)
# Category order
tier_order = ["es", "sr", "sc", "cc", "ic", "bp", "rpf"]
frame["from_cat"] = pd.Categorical(frame["from_cat"], categories=tier_order, ordered=True)
frame["to_cat"] = pd.Categorical(frame["to_cat"], categories=tier_order, ordered=True)
# Deterministic ordering
frame = frame.sort_values(["from_cat", "to_cat"]).reset_index(drop=True)
# Left/right labels and indices
left_tiers = tier_order
right_tiers = tier_order
node_labels = [f"{c} (L)" for c in left_tiers] + [f"{c} (R)" for c in right_tiers]
label_to_id = {lbl: i for i, lbl in enumerate(node_labels)}
frame["src_id"] = frame["from_cat"].map(lambda c: label_to_id.get(f"{c} (L)", -1))
frame["dst_id"] = frame["to_cat"].map(lambda c: label_to_id.get(f"{c} (R)", -1))
frame["src_id"] = pd.to_numeric(frame["src_id"], downcast="integer", errors="coerce").fillna(-1).astype(int)
frame["dst_id"] = pd.to_numeric(frame["dst_id"], downcast="integer", errors="coerce").fillna(-1).astype(int)
# Colors
COLOR_BY_GROUP = {
    "es": "#F6C57A",
    "sr": "#A6D8F0",
    "sc": "#7BDCB5",
    "cc": "#FFC20A",
    "ic": "#88BDE6",
    "bp": "#F4A582",
    "rpf": "#DDA0DD",
    "Unknown": "#D3D3D3"
}
node_fill = [COLOR_BY_GROUP[c] for c in left_tiers] + [COLOR_BY_GROUP[c] for c in right_tiers]
link_fill = [node_fill[s] for s in frame["src_id"].tolist()]
# X positions anchored to edges, matching left/right unique categories
x_pos = [0.001 for _ in frame["from_cat"].unique()] + [0.999 for _ in frame["to_cat"].unique()]
# Compute Y as cumulative centers per side; Y is the center of each node
pos_df = pd.DataFrame()
pos_df["left_total"] = frame.groupby("from_cat", observed=True)["percent"].sum()
pos_df["left_center"] = pos_df["left_total"].cumsum().sub(pos_df["left_total"]/2).div(100)
pos_df["right_total"] = frame.groupby("to_cat", observed=True)["percent"].sum()
pos_df["right_center"] = pos_df["right_total"].cumsum().sub(pos_df["right_total"]/2).div(100)
y_pos = pos_df["left_center"].tolist() + pos_df["right_center"].tolist()
chart = go.Figure(go.Sankey(
    arrangement="fixed",
    node=dict(
        pad=40,
        thickness=25,
        line=dict(color="black", width=0.5),
        label=node_labels,
        x=x_pos,
        y=y_pos,
        color=node_fill
    ),
    link=dict(
        source=frame["src_id"].tolist(),
        target=frame["dst_id"].tolist(),
        value=frame["percent"].tolist(),
        color=link_fill,
        hovertemplate="%{source.label} → %{target.label}<br><b>%{value:.2f}%</b><extra></extra>"
    ),
    valueformat=".2f",
    valuesuffix="%"
))
chart.update_layout(
    title="Flow",
    font_size=12,
    paper_bgcolor="#f7f7f7",
    plot_bgcolor="#f7f7f7",
    margin=dict(l=30, r=30, t=60, b=30),
    width=1000,
    height=800
)
chart.show()
Why this matters
Accurate node alignment makes a Sankey readable at a glance. When centers follow the true cumulative width, links flow smoothly and the visual weight matches the underlying values. It also improves consistency: category order is fixed through Categorical with ordered=True and a deterministic sort, which keeps the left and right stacks stable.
Takeaways
If you control node positions in a Plotly Sankey with arrangement set to fixed, compute Y from totals per category. Treat Y as the node center, sum the widths of all previous nodes, then add half the current width, and normalize to the canvas. Keep category order explicit and sorting deterministic to preserve label and color mapping. With that, the diagram will align correctly without manual nudging of coordinates.