2025, Sep 30 01:00
Troubleshooting TensorFlow Datasets SSL Certificate Verify Failed for MovieLens 100k Downloads
Learn how TensorFlow Datasets MovieLens 100k downloads failed with SSL certificate verify failed (expired cert), what was tried, and why retrying later fixed
SSL hiccups in data pipelines are rare but disruptive. A TensorFlow Datasets workflow that previously ran cleanly can suddenly fail with an SSL certificate error during download, blocking training and validation runs. Below is a concise walkthrough of the failure, what it looked like in code, what was attempted, and how it ultimately resolved.
Reproducing the issue
The workflow uses TensorFlow Datasets to fetch the MovieLens 100k ratings split. The environment setup included installing tensorflow-data-validation with visualization extras, then invoking tfds.load to pull the dataset.
!pip install --upgrade 'tensorflow_data_validation[visualization]<2'import tensorflow as tf
import tensorflow_datasets as tfds
ds_ratings, ds_info = tfds.load("movielens/100k-ratings", split="train", with_info=True)During execution, the process repeatedly retried and eventually failed with an SSL verification error indicating an expired certificate while attempting to fetch ml-100k.zip.
WARNING:urllib3.connectionpool:Retrying (Retry(total=9, connect=None, read=None, redirect=None, status=None)) after connection broken by 'SSLError(SSLCertVerificationError(1, '[SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed: certificate has expired (_ssl.c:1016)'))': /datasets/movielens/ml-100k.zipWhat’s happening
The error clearly signals an SSL handshake failure: certificate verify failed because the certificate is reported as expired. The same behavior reproduced both in Google Colab and in a WSL2-based VS Code environment. Multiple local mitigations were attempted, including toggling verification behavior, refreshing certifi, and overriding the default SSL context. None of those steps resolved the download.
Attempts and outcome
Environment-level workarounds did not change the result. The error persisted across runs and across environments, despite verification toggles and certificate package updates. The behavior suggested that the immediate local changes were not effective for this case.
Resolution and working code
An issue was filed in the googlecolab/colabtools repository. After that, running the same code later loaded the dataset successfully without any code changes. In other words, the pipeline began working again as-is.
import tensorflow as tf
import tensorflow_datasets as tfds
ds_ratings, ds_info = tfds.load("movielens/100k-ratings", split="train", with_info=True)Why this matters
Dataset ingestion is foundational for experimentation and production ML. An SSL verification failure can halt a workflow even when application code is correct. Recognizing the class of failure helps focus effort: if local toggles and certificate updates do not help and the error recurs across clean environments, the most productive move can be to monitor and retry, and, when appropriate, surface the problem to the platform or service maintainers. In this case, simply rerunning later restored normal operation.
Takeaways
First, keep the repro minimal to ensure the failure is isolated to the dataset fetch. Second, note the exact error text—here it flagged an expired certificate—which can guide whether local changes are likely to help. Third, if the same minimal code fails identically across environments and then later succeeds unchanged, treat it as a resolved external condition and proceed. The important part is to avoid unnecessary code churn when the pipeline itself is sound.