2025, Oct 28 05:00
Matching Keras Dense and NumPy: CPU vs GPU numerical differences explained and how to get identical outputs
Learn why Keras Dense and NumPy matmul can differ by 1e-5 due to GPU vs CPU execution, and how to force parity or compare with tolerance for reproducible tests.
Matching a simple Dense layer from TensorFlow/Keras with a pure NumPy forward pass looks trivial. Multiply an input matrix by the weight matrix, skip the bias, and you should get identical numbers. In practice, you may see tiny discrepancies around 1e-5, which is puzzling when the only visible operation is a matrix multiplication.
Minimal example that exhibits the mismatch
import numpy as np
import keras
from keras import layers
print("Keras version:", keras.__version__)
print("Backend", keras.backend.backend())
# Build a tiny model
src = layers.Input((2,), name='inp')
dense_out = layers.Dense(5, kernel_initializer='random_normal', use_bias=False, name='dense')(src)
toy_net = keras.Model(inputs=src, outputs=dense_out)
# Random input
feed = np.random.random(size=(5, 2)).astype(np.float32)
# Keras forward pass
y_keras = toy_net.predict(feed)
# Extract the Dense kernel
[weight_mat] = toy_net.layers[1].get_weights()
# NumPy forward pass
y_np = np.matmul(feed, weight_mat)
# Compare
print("Keras result:\n", y_keras)
print("NumPy result:\n", y_np)
print("Same result:", np.allclose(y_keras, y_np))
What actually causes the difference
The observed gap does not come from custom math in the Dense layer or from an extra hidden operation. It comes from where the math runs. NumPy executes on CPU. TensorFlow/Keras may execute the same matmul on GPU. Different hardware paths use different implementations for the same mathematical operation, which produces slightly different floating‑point results. That is enough to yield small deviations in otherwise identical computations.
Even with a small input of shape (5, 2) multiplied by a kernel of shape (2, 5), the accumulation of floating‑point operations on different execution units can lead to differences in the order of 1e-5 per element.
How to align Keras with NumPy
If you run the Keras model on CPU, you will get the same numbers as NumPy for this case. Disabling CUDA forces TensorFlow/Keras to use the CPU path and removes the discrepancy.
import os
os.environ['CUDA_VISIBLE_DEVICES'] = '-1'
import numpy as np
import keras
from keras import layers
print("Keras version:", keras.__version__)
print("Backend", keras.backend.backend())
src = layers.Input((2,), name='inp')
dense_out = layers.Dense(5, kernel_initializer='random_normal', use_bias=False, name='dense')(src)
toy_net = keras.Model(inputs=src, outputs=dense_out)
feed = np.random.random(size=(5, 2)).astype(np.float32)
y_keras = toy_net.predict(feed)
[weight_mat] = toy_net.layers[1].get_weights()
y_np = np.matmul(feed, weight_mat)
print("Same result:", np.allclose(y_keras, y_np))
Why this matters for practitioners
Small numerical shifts can propagate across layers and affect tests, reproducibility checks, or regression thresholds. If you verify a forward pass against a reference NumPy implementation, the compute device behind your deep learning framework directly affects bitwise and tolerance‑based comparisons.
Takeaways
If you need parity between TensorFlow/Keras and NumPy for linear layers, run both on CPU or both on GPU. When that is not possible, compare with a tolerance and expect tiny differences, even for simple matrix multiplications. Understanding that the discrepancy comes from CPU versus GPU execution helps you choose the right environment for debugging, validating, and writing reproducible tests.