2025, Nov 06 23:00

Efficiently convert a pandas DataFrame of edge weights to a dict-of-dicts-of-sets for NetworkX

Learn how to convert a pandas DataFrame of edge weights into a dict-of-dicts-of-sets, keeping nonzero edges to build fast, accurate NetworkX graphs reliably.

Transforming a pandas DataFrame of edge weights into a dict-of-dicts-of-sets often becomes a bottleneck if you naively loop over every cell. When the goal is to build a Networkx graph from nonzero entries, you want something concise and efficient that preserves the exact structure your downstream code expects.

Problem overview

There is a DataFrame where the index holds node labels and each column represents a potential connection with a numeric weight. Only nonzero values represent existing edges, and each weight should be wrapped in a single-element set:

payload = {'I': ['A', 'B', 'C', 'D'], 'X': [1, 0, 3, 1], 'Y': [0, 1, 2, 1], 'Z': [1, 0, 0, 0], 'W': [3, 2, 0, 0]}
frame = pd.DataFrame(data=payload, columns=['I','X','Y','Z','W'])
frame.set_index('I', inplace=True, drop=True)

The desired structure is a mapping from node to its nonzero attributes and their weights, where each weight is represented as a set:

{'A': {'X': {1}, 'Z': {1}, 'W': {3}}, 'B': {'Y': {1}, 'W': {2}}, 'C': {'X': {3}, 'Y': {2}}, 'D': {'Y': {1}, 'X': {1}}}

What’s really happening

Each row encodes a node, each column is a potential edge attribute, and the cell value is the weight. The task is to skip zeros and carry through only existing edges while wrapping each scalar weight into a one-element set. This format is convenient for constructing and analyzing a Networkx graph, but building it cell by cell doesn’t scale well.

Solution approaches

A straightforward and clean way is to iterate row-wise and filter on the fly using a dictionary comprehension. While this isn’t vectorized, it avoids nested loops in user code and remains readable:

edge_map = {
    src: {attr: {val} for attr, val in attrs.items() if val != 0}
    for src, attrs in frame.iterrows()
}

If you want to minimize pandas iteration overhead further, first convert the DataFrame to a plain dictionary and then construct the nested structure. This keeps the transformation in native Python data structures:

row_dict = frame.to_dict(orient='index')
edge_map = {
    src: {k: {v} for k, v in inner.items() if v != 0}
    for src, inner in row_dict.items()
}

You can also compress this into a single comprehension without an intermediate variable:

edge_map = {
    node: {col: {val} for col, val in cols.items() if val}
    for node, cols in frame.to_dict(orient='index').items()
}

Why this matters

For large datasets, clarity and performance go hand in hand. Keeping the transformation in a single comprehension minimizes overhead, produces exactly the nested structure expected by graph tooling, and avoids manual nested loops. It’s also easier to reason about correctness: nonzero values are kept, zeros are dropped, and each weight is preserved as a set.

Takeaways

When you need a dict-of-dicts-of-sets from a DataFrame of edge weights, lean on a row-wise dictionary conversion and a single filtering comprehension. If you prefer to stay in pandas land, iterating rows for this specific transformation is still reasonably efficient and perfectly readable. And if you don’t need DataFrame features outside this conversion, consider working directly with a plain dictionary from the start; it’s simpler and typically faster to process.

The article is based on a question from StackOverflow by carpediem and an answer by Viktor Sbruev.

dataframe dictionary networkx pandas python