2025, Dec 25 01:00
Resolve AWS Glue 3.0 PythonShell Dependency Errors by Pinning Compatible Versions in Your Wheel
Fix AWS Glue 3.0 PythonShell startup failures from package conflicts: pin compatible dependencies in requirements.txt, rebuild the wheel, and stabilize ETL jobs.
Package conflicts in AWS Glue PythonShell jobs are a classic source of startup failures. When your wheel bundles versions that don’t match the Glue 3.0 runtime expectations, the job exits almost immediately and CloudWatch fills with incompatibility messages. In a data lake ETL and analytics setup that leans on S3 for storage, Glue for job orchestration, and Athena for querying, this becomes a productivity trap unless the dependency set is aligned with Glue’s constraints.
What the failure looks like
In the logs you will see lines that point straight at version mismatches between your bundled libraries and the runtime. The essence of the problem is captured by messages like these:
scipy 1.8.0 requires numpy<1.25.0,>=1.17.3, but you have numpy 2.0.2 which is incompatible.
redshift-connector 2.0.907 requires pytz<2022.2,>=2020.1, but you have pytz 2025.2 which is incompatible.
awswrangler 2.15.1 requires numpy<2.0.0,>=1.21.0, but you have numpy 2.0.2 which is incompatible.
awswrangler 2.15.1 requires pandas<2.0.0,>=1.2.0, but you have pandas 2.2.3 which is incompatible.
awscli 1.23.5 requires botocore==1.25.5, but you have botocore 1.38.9 which is incompatible.
awscli 1.23.5 requires s3transfer<0.6.0,>=0.5.0, but you have s3transfer 0.12.0 which is incompatible.
aiobotocore 2.2.0 requires botocore<1.24.22,>=1.24.21, but you have botocore 1.38.9 which is incompatible.
The job itself may be created from a Python controller that triggers a PythonShell job with a wheel stored in S3. A typical configuration looks like this, where the script location and wheel are in your bucket and the runtime is set to Python 3.9 on Glue 3.0:
response = glue_client.create_job(
Name=cfg["aws_glue"]["etl_jobs"][0]["name"],
Role=iam_role_arn,
JobMode="SCRIPT",
ExecutionProperty={"MaxConcurrentRuns": 1},
Command={
"Name": "pythonshell",
"ScriptLocation": cfg["aws_glue"]["S3_URI"],
"PythonVersion": "3.9",
},
DefaultArguments={
"--TempDir": str(cfg["s3_bucket"]["bucket"] + cfg["s3_bucket"]["temp"]),
"--extra-py-files": str(
cfg["s3_bucket"]["bucket"]
+ cfg["s3_bucket"]["dependencies"]
+ "pokemon_datalake_and_anltx-0.1.0-cp39-none-any.whl"
),
"--job-language": "python",
},
MaxRetries=0,
GlueVersion="3.0",
Description="Job for processing raw Pokémon data.",
)
print(f"Job has been created: {response['Name']}")
Why it breaks
The AWS Glue 3.0 PythonShell runtime expects specific, mutually compatible library versions. If your wheel bundles libraries like numpy, pandas, botocore, or awswrangler that exceed the bounds of what the runtime supports, Glue surfaces the conflict and aborts the run. Even if your local venv or conda is on Python 3.9 and your wheel is built for cp39, the versions inside that wheel still must satisfy the constraints of the Glue 3.0 environment.
How to resolve it
The fix is to align your dependency set with the Glue 3.0 constraints and keep your job configuration stable. Start by preparing a requirements.txt that pins compatible versions. Use this requirements file to build your wheel so that the bundled dependencies respect the Glue 3.0 expectations. Ensure your Python script imports these packages and nothing that drags in incompatible transitive versions. Upload the script to S3 in the scripts location you are already using. Keep the Glue job configuration as shown, referencing your script and the wheel in S3. Create or recreate the job using your standard flow; if you prefer the CLI for provisioning, create the Glue job from there using the same configuration details.
Corrected configuration in context
With the dependency set pinned to compatible versions, the job definition itself does not need structural changes. The creation call can remain functionally identical, continuing to use PythonShell on Python 3.9 with the wheel in S3:
job_resp = glue_svc.create_job(
Name=conf["aws_glue"]["etl_jobs"][0]["name"],
Role=role_arn,
JobMode="SCRIPT",
ExecutionProperty={"MaxConcurrentRuns": 1},
Command={
"Name": "pythonshell",
"ScriptLocation": conf["aws_glue"]["S3_URI"],
"PythonVersion": "3.9",
},
DefaultArguments={
"--TempDir": str(conf["s3_bucket"]["bucket"] + conf["s3_bucket"]["temp"]),
"--extra-py-files": str(
conf["s3_bucket"]["bucket"]
+ conf["s3_bucket"]["dependencies"]
+ "pokemon_datalake_and_anltx-0.1.0-cp39-none-any.whl"
),
"--job-language": "python",
},
MaxRetries=0,
GlueVersion="3.0",
Description="Job for processing raw Pokémon data.",
)
print(f"Job has been created: {job_resp['Name']}")
The important change happens in the dependency set you package, not in the job creation call. Once your wheel reflects a requirements.txt that respects the runtime’s version window, the incompatibility messages disappear.
Why this knowledge matters
Glue’s managed runtime gives you a predictable base, but it also means your third-party stack must fit within its boundaries. When building data lake ETL on top of Glue 3.0, dependency discipline saves hours of debugging and avoids flakiness across environments. Tying your wheel build to a requirements.txt that matches Glue prevents hidden upgrades, transitive surprises, and broken runs after innocuous changes.
Practical takeaways
Pin your package versions explicitly for Glue 3.0 and rebuild the wheel with those constraints. Keep your PythonShell job on Python 3.9 as configured and point to the script and dependencies in S3. Create or update the Glue job with the same parameters, and let the aligned dependency set do the heavy lifting. This approach stabilizes your ETL pipeline and keeps your analytics stack consistent across development and production.