2025, Oct 18 04:00
Install pyarrow on AWS Lambda Python 3.11 without build failures: glibc 2.26, wheels, numpy pins
Trouble installing pyarrow on AWS Lambda Python 3.11? glibc 2.26 blocks wheels; numpy pinning misleads pip. Fixes: pin 20.0.0, use 3.12, or build a wheel.
Installing pyarrow in an AWS Lambda base image can look trivial until the dependency chain meets system-level constraints. A common scenario: pinning numpy below 2.3.0 on the Python 3.11 Lambda image and then adding pyarrow results in pip attempting to fetch numpy 2.3.2 anyway and failing during build. The core of the problem is not just Python dependencies—it’s the glibc version inside the container and the absence of a compiler.
Minimal reproduction
The setup below demonstrates the issue on the Python 3.11 AWS Lambda image. The first step installs a constrained numpy, the second adds pyarrow with the s3 extra.
FROM public.ecr.aws/lambda/python:3.11
RUN python -m pip install "numpy<2.3.0"
RUN python -m pip install "pyarrow[s3]"
Pip proceeds with something like:
Collecting numpy>=1.25
  Downloading numpy-2.3.2.tar.gz (20.5 MB)
  ...
What’s actually going on
The pyarrow installation path falls back to building from source inside the container. That fails because the image does not have a C compiler. The root cause for building from source in the first place is the glibc level of the base image. The Python 3.11 AWS Lambda image uses glibc 2.26. Pre-built wheels for pyarrow 21.0.0 require glibc at least 2.28, so pip cannot use a wheel and tries to compile.
Even if numpy is pinned below 2.3.0, pyarrow declares a requirement of numpy>=1.25. The resolver may attempt to move to a newer numpy during the pyarrow step, and because there’s no compatible wheel path for pyarrow on glibc 2.26, the process heads toward a source build and breaks.
Fixes that align with the environment
If the goal is to keep Python 3.11 and the Lambda base image, use a pyarrow release that still ships wheels compatible with older glibc. Pinning pyarrow to 20.0.0 avoids the source build on this image and installs cleanly.
FROM public.ecr.aws/lambda/python:3.11
RUN python -m pip install "numpy<2.3.0"
RUN python -m pip install "pyarrow[s3]==20.0.0"
Another route is to choose a base image with a newer glibc so recent pyarrow wheels are usable. The Python 3.12 AWS Lambda image uses glibc 2.34 and works with recent pyarrow releases.
FROM public.ecr.aws/lambda/python:3.12
RUN python -m pip install "numpy<2.3.0"
RUN python -m pip install "pyarrow[s3]==21.0.0"
When neither changing pyarrow nor the Python base image is acceptable, the remaining option is to produce a wheel matched to glibc 2.26 by compiling pyarrow from source in an environment based on the same or older glibc, then copying that wheel into the Lambda image and installing it. The official guidance for building pyarrow is available at the Apache Arrow documentation on building for Linux and macOS.
In practice, dependency resolution sometimes behaves more predictably when packages are installed at once. Installing numpy and pyarrow together in a single pip invocation, or using a requirements.txt with both entries, can help ensure the resolver sees the full set of constraints upfront. This is an optional workflow improvement and does not replace the need to match pyarrow wheels with the correct glibc.
Why this matters
Python packaging often hides the fact that native extensions depend on system libraries. When glibc is older than what a pre-built wheel targets, pip falls back to building from source. Without a compiler toolchain in the image, the build fails. Recognizing this relationship between base image glibc, wheel compatibility, and pyarrow’s binary distribution explains why pinning numpy alone won’t solve the underlying problem.
Takeaways
If you’re on the Python 3.11 Lambda image and need pyarrow quickly, pin pyarrow to 20.0.0. If you can move to Python 3.12, keep using current pyarrow and install the wheels directly. If versions are fixed for other reasons, compile pyarrow against the target glibc and install the resulting wheel in your Lambda container. For cleaner dependency resolution, consider installing numpy and pyarrow together in a single pip command or via requirements.txt, but keep in mind that the glibc-versus-wheel compatibility is the decisive factor.
The article is based on a question from StackOverflow by Flo and an answer by Nick ODell.