2025, Oct 07 23:00

Fix Keras multi-output training errors: match y_true/y_pred structures with dict labels in tf.data

Learn why Keras multi-output models fail with y_true/y_pred structure mismatch and how to fix it: reshape targets into a dict keyed by head names using tf.data.

Training a multi-output image classifier in Keras can stumble on an unexpected type mismatch: the model emits a list of outputs, while your dataset provides a single vector per sample. The result is a structure mismatch error that blocks training. Below is a clear walkthrough of why it happens and how to structure labels so Keras can align them with each output head.

Symptom

When starting training, the process stops with:

ValueError: y_true and y_pred have different structures.
y_true: *
y_pred: ['*', '*', '*', '*']

Problem setup: minimal example

The dataset yields an image and a 4-element vector of categorical targets per image. The model has four heads and uses SparseCategoricalCrossentropy for each head.

def fetch_targets(p):
    p = p.numpy().decode("utf-8")
    k = os.path.basename(p)[:9]
    if k not in target_map:
        print("Missing key:", k)
        raise ValueError("Missing label key.")
    return tf.convert_to_tensor(target_map[k], dtype=tf.uint8)
def load_frame(p):
    raw = tf.io.read_file(p)
    pic = tf.io.decode_jpeg(raw, channels=3)
    return tf.image.resize_with_crop_or_pad(pic, 360, 360)
def make_sample(file_p):
    y = tf.py_function(func=fetch_targets, inp=[file_p], Tout=tf.uint8)
    y.set_shape([4])
    x = tf.py_function(func=load_frame, inp=[file_p], Tout=tf.uint8)
    x.set_shape([360, 360, 3])
    return x, y
# Example content of target_map
# 'Img_00001': [0, 1, 0, 1], 'Img_00002': [2, 0, 4, 1], 'Img_00003': [2, 0, 1, 0],
# 'Img_00004': [4, 1, 2, 1], 'Img_00005': [3, 1, 3, 1], 'Img_00006': [1, 1, 5, 1]
split_count = int(file_list_ds.cardinality().numpy() * 0.2)
train_ds = file_list_ds \
  .skip(split_count) \
  .map(make_sample, num_parallel_calls=tf.data.AUTOTUNE) \
  .cache() \
  .batch(100) \
  .prefetch(buffer_size=tf.data.AUTOTUNE)
val_ds = file_list_ds \
  .take(split_count) \
  .map(make_sample, num_parallel_calls=tf.data.AUTOTUNE) \
  .cache() \
  .batch(100) \
  .prefetch(buffer_size=tf.data.AUTOTUNE)
input_tensor = tf.keras.layers.Input(shape=(360, 360, 3))
feats = tf.keras.layers.Rescaling(1./255)(input_tensor)
feats = tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same')(feats)
feats = tf.keras.layers.MaxPooling2D((2, 2))(feats)
feats = tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same')(feats)
feats = tf.keras.layers.MaxPooling2D((2, 2))(feats)
feats = tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same')(feats)
feats = tf.keras.layers.Flatten()(feats)
feats = tf.keras.layers.Dense(128, activation='relu')(feats)
head_label = tf.keras.layers.Dense(len(label_classes))(feats)
head_cellshape = tf.keras.layers.Dense(len(cellshape_classes))(feats)
head_nucleusshape = tf.keras.layers.Dense(len(nucleusshape_classes))(feats)
head_cytovacuole = tf.keras.layers.Dense(len(cytovacuole_classes))(feats)
net = tf.keras.Model(
    inputs=input_tensor,
    outputs=[head_label, head_cellshape, head_nucleusshape, head_cytovacuole]
)
net.compile(
  optimizer=tf.keras.optimizers.Adam(),
  loss={
    "head_label": tf.keras.losses.SparseCategoricalCrossentropy(),
    "head_cellshape": tf.keras.losses.SparseCategoricalCrossentropy(),
    "head_nucleusshape": tf.keras.losses.SparseCategoricalCrossentropy(),
    "head_cytovacuole": tf.keras.losses.SparseCategoricalCrossentropy()
  },
  metrics={
    "head_label": ["sparse_categorical_accuracy"],
    "head_cellshape": ["sparse_categorical_accuracy"],
    "head_nucleusshape": ["sparse_categorical_accuracy"],
    "head_cytovacuole": ["sparse_categorical_accuracy"]
  }
)
history = net.fit(
  train_ds,
  validation_data=val_ds,
  epochs=10,
  batch_size=100,
  validation_steps=1
)

What causes the error

The model produces four outputs, one per classification head. Keras expects the target structure to mirror the prediction structure. Instead of four targets (one per head), the dataset yields a single vector with four integers. This mismatch triggers the “different structures” error because Keras cannot associate each loss function with its corresponding part of the target.

Fix: provide a label dict keyed by output names

Each sample must return a mapping from output name to its class index. That way, the framework can route each piece of y_true to the correct head and loss. A single-sample target should look like this:

example_target = {
    "head_label": tf.Tensor([0]),
    "head_cellshape": tf.Tensor([1]),
    "head_nucleusshape": tf.Tensor([0]),
    "head_cytovacuole": tf.Tensor([1]),
}

And for a batch of size 5:

example_batch = {
    "head_label": tf.Tensor([0, 2, 3, 1, 0]),
    "head_cellshape": tf.Tensor([3, 2, 3, 2, 0]),
    "head_nucleusshape": tf.Tensor([2, 2, 3, 4, 1]),
    "head_cytovacuole": tf.Tensor([1, 2, 3, 1, 1]),
}

The only change to the input pipeline is to reshape the four-element vector into a dict keyed by the model’s output names.

Working pipeline with corrected labels

Below is the adjusted sample-preparation function and the rest of the flow unchanged. The model and compile blocks remain as in the problem setup; only the label structure is different.

def fetch_targets(p):
    p = p.numpy().decode("utf-8")
    k = os.path.basename(p)[:9]
    if k not in target_map:
        print("Missing key:", k)
        raise ValueError("Missing label key.")
    return tf.convert_to_tensor(target_map[k], dtype=tf.uint8)
def load_frame(p):
    raw = tf.io.read_file(p)
    pic = tf.io.decode_jpeg(raw, channels=3)
    return tf.image.resize_with_crop_or_pad(pic, 360, 360)
def make_sample(file_p):
    y_vec = tf.py_function(func=fetch_targets, inp=[file_p], Tout=tf.uint8)
    y_vec.set_shape([4])
    x = tf.py_function(func=load_frame, inp=[file_p], Tout=tf.uint8)
    x.set_shape([360, 360, 3])
    y_dict = {
        "head_label": y_vec[0],
        "head_cellshape": y_vec[1],
        "head_nucleusshape": y_vec[2],
        "head_cytovacuole": y_vec[3],
    }
    return x, y_dict
split_count = int(file_list_ds.cardinality().numpy() * 0.2)
train_ds = file_list_ds \
  .skip(split_count) \
  .map(make_sample, num_parallel_calls=tf.data.AUTOTUNE) \
  .cache() \
  .batch(100) \
  .prefetch(buffer_size=tf.data.AUTOTUNE)
val_ds = file_list_ds \
  .take(split_count) \
  .map(make_sample, num_parallel_calls=tf.data.AUTOTUNE) \
  .cache() \
  .batch(100) \
  .prefetch(buffer_size=tf.data.AUTOTUNE)
input_tensor = tf.keras.layers.Input(shape=(360, 360, 3))
feats = tf.keras.layers.Rescaling(1./255)(input_tensor)
feats = tf.keras.layers.Conv2D(32, (3, 3), activation='relu', padding='same')(feats)
feats = tf.keras.layers.MaxPooling2D((2, 2))(feats)
feats = tf.keras.layers.Conv2D(64, (3, 3), activation='relu', padding='same')(feats)
feats = tf.keras.layers.MaxPooling2D((2, 2))(feats)
feats = tf.keras.layers.Conv2D(128, (3, 3), activation='relu', padding='same')(feats)
feats = tf.keras.layers.Flatten()(feats)
feats = tf.keras.layers.Dense(128, activation='relu')(feats)
head_label = tf.keras.layers.Dense(len(label_classes))(feats)
head_cellshape = tf.keras.layers.Dense(len(cellshape_classes))(feats)
head_nucleusshape = tf.keras.layers.Dense(len(nucleusshape_classes))(feats)
head_cytovacuole = tf.keras.layers.Dense(len(cytovacuole_classes))(feats)
net = tf.keras.Model(
    inputs=input_tensor,
    outputs=[head_label, head_cellshape, head_nucleusshape, head_cytovacuole]
)
net.compile(
  optimizer=tf.keras.optimizers.Adam(),
  loss={
    "head_label": tf.keras.losses.SparseCategoricalCrossentropy(),
    "head_cellshape": tf.keras.losses.SparseCategoricalCrossentropy(),
    "head_nucleusshape": tf.keras.losses.SparseCategoricalCrossentropy(),
    "head_cytovacuole": tf.keras.losses.SparseCategoricalCrossentropy()
  },
  metrics={
    "head_label": ["sparse_categorical_accuracy"],
    "head_cellshape": ["sparse_categorical_accuracy"],
    "head_nucleusshape": ["sparse_categorical_accuracy"],
    "head_cytovacuole": ["sparse_categorical_accuracy"]
  }
)
history = net.fit(
  train_ds,
  validation_data=val_ds,
  epochs=10,
  batch_size=100,
  validation_steps=1
)

Why this matters

Multi-head models rely on a one-to-one mapping between outputs, losses, and targets. If the dataset couples all targets into a single vector, the framework cannot disambiguate which slice belongs to which head. Aligning structures prevents subtle training bugs, keeps metrics attached to the right heads, and makes failure modes immediately interpretable.

Takeaways

Mirror the model’s output structure in the labels your dataset yields. For multi-output classification with separate SparseCategoricalCrossentropy losses, pass a dict of targets whose keys match the model’s head names. This small adjustment resolves the structure mismatch and lets Keras connect each y_true to its corresponding y_pred and loss function without guesswork.

The article is based on a question from StackOverflow by Fish4203 and an answer by hvater.