2025, Dec 19 21:00
How to Attach Precomputed Node-Level Tensors to torch_geometric Data Objects in the ZINC Dataset
Learn to enrich PyTorch Geometric ZINC graphs by attaching per-graph node features as tensors via Data attributes and an InMemoryDataset wrapper. Includes code.
Attaching precomputed tensor features to each graph in a torch_geometric dataset is a common need when you enrich node representations outside the usual pipeline. The goal is straightforward: take a per-graph tensor you’ve already computed and store it alongside the existing fields inside each Data object, so downstream code can consume it transparently.
Problem setup
Suppose you load the ZINC train split and inspect the first graph. You see a Data object with node features, edges, and target:
from torch_geometric.datasets import ZINC
zinc_train = ZINC(root='my_path', split='train')
print(zinc_train[0]) # Example structure
# Data(x=[33, 1], edge_index=[2, 72], edge_attr=[72], y=[1])
You have already computed an additional tensor feature for each graph, arranged as a list where the i-th tensor corresponds to the i-th graph in the dataset, and each tensor is intended to be a node-level feature. The desired end state is that every Data includes this payload, for example:
Data(x=[33, 1], edge_index=[2, 72], edge_attr=[72], y=[1], new_feature=[33, 12])
What’s actually happening
Each element of ZINC is a torch_geometric.data.Data instance. These objects are flexible and can hold arbitrary attributes. If your external features are prepared per graph and their first dimension matches the number of nodes in that graph, you can attach them directly onto each Data instance and keep them as part of the dataset.
Solution
You can inject the new tensors by iterating the base dataset and setting an attribute on every Data object. One convenient way is to wrap the original dataset with a lightweight InMemoryDataset that performs the augmentation once and exposes the modified items.
import torch
from torch_geometric.datasets import ZINC
from torch_geometric.data import InMemoryDataset
# 1) Load the source dataset
base_ds = ZINC(root='my_path', split='train')
# 2) Build a list of new node-wise tensors aligned with base_ds
# Replace the following with your real tensors; shapes must agree per graph
aug_tensor_list = []
for g in base_ds:
node_count = g.x.size(0)
# Example placeholder: [num_nodes, 12]
feat_tensor = torch.randn(node_count, 12)
aug_tensor_list.append(feat_tensor)
# 3) Wrap and attach the features to each Data instance
class ZINCEnriched(InMemoryDataset):
def __init__(self, src_ds, feature_list):
self._store = []
for idx in range(len(src_ds)):
item = src_ds[idx]
item.new_feature = feature_list[idx]
self._store.append(item)
super().__init__('.', transform=None, pre_transform=None)
self.data, self.slices = self.collate(self._store)
def __len__(self):
return len(self._store)
def get(self, index):
return self._store[index]
# 4) Create the enriched dataset
zinc_with_extra = ZINCEnriched(base_ds, aug_tensor_list)
# 5) Inspect a sample
example = zinc_with_extra[0]
print(example)
print("Shape of new feature:", example.new_feature.shape)
# Data(x=[33, 1], edge_index=[2, 72], edge_attr=[72], y=[1], new_feature=[33, 12])
# Shape of new feature: torch.Size([33, 12])
Why this matters
Once the additional tensor is attached to each Data object, it travels together with the graph wherever that object goes. That keeps your preprocessing and modeling steps aligned: the model can access all needed per-node information from a single place without juggling parallel structures.
Takeaways
If you maintain a one-to-one correspondence between dataset items and your tensors, and the node dimension matches per graph, you can directly set a new attribute on each Data object. The approach above has been verified to work when using the dataset as shown. If you initialize the dataset with a transform, this may behave differently; apply the method on the dataset instantiated as demonstrated here if you want to reproduce the working setup.