2025, Nov 10 17:00

Safely Convert C/C++ uint8_t* Buffers to Python with ctypes: Bytes Copy or Zero-Copy NumPy Arrays

Pass a raw C/C++ uint8_t* buffer to Python with ctypes: safely handle length, and convert to bytes or a zero-copy NumPy array via numpy.ctypeslib.as_array.

Passing a raw C/C++ byte buffer to Python with ctypes looks simple until you need to handle length and performance. A function returns a uint8_t* with an out-parameter for size, there is no null terminator, and you want a bytes object or a NumPy array without a slow Python loop. The core question is how to safely and efficiently convert that pointer to a usable Python type.

Problem setup

Below is a minimal shape of the interop. The buffer is arbitrary-length numeric data; there is no implicit terminator, so a pointer alone isn’t enough.

// C/C++
uint8_t* fetch_buf(uint32_t* out_len);  // returns an arbitrary-length numeric buffer
# Python
import ctypes as ct
lib = ct.CDLL("libshared.so")
lib.fetch_buf.argtypes = [ct.POINTER(ct.c_uint32)]
lib.fetch_buf.restype = ct.POINTER(ct.c_uint8)
size_out = ct.c_uint32(0)
ptr = lib.fetch_buf(size_out)
# what now?

I need this to be a type that’s convertible to numpy array.

What’s actually going on

A raw C pointer doesn’t carry size information. Python’s bytes constructor also doesn’t infer size from a pointer, so you must explicitly pair the pointer with the length you received via the out-parameter. From there, you have three practical options that cover common needs: produce a bytes object as a copy, create a Python-side list of ints via slicing, or wrap the buffer into a NumPy array that shares the original memory.

Using ct.string_at(ptr, length) returns a bytes object by copying the memory region of the specified size. Slicing the pointer like ptr[:length] builds a new Python list of ints representing the byte values. Using np.ctypeslib.as_array(ptr, shape=(length,)) wraps the existing buffer without copying; it returns a numpy.ndarray that shares the underlying memory, which is the preferred path when performance matters.

If the C side allocated the buffer, remember to release it. After copying (bytes or list), you can free immediately. If you share memory with NumPy, don’t free until you’re done with the array.

Solution in practice

If you specifically want something convertible to NumPy, you can go straight to a NumPy array using the shared-memory approach. If you prefer a bytes object or a Python list, use the corresponding copy-based methods. Here is a working example demonstrating all three paths.

Working example (Windows)

Source that allocates a buffer of five bytes, fills it with 0..4, returns the pointer and its size, and provides a free function.

// file: demo.c
#include <stdint.h>
#include <stdlib.h>

__declspec(dllexport)
uint8_t* fetch_buf(uint32_t* out_len) {
    uint8_t* buf = malloc(5);
    for(uint8_t i = 0; i < 5; ++i)
        buf[i] = i;
    *out_len = 5;
    return buf;
}

__declspec(dllexport)
void release_buf(uint8_t* p) {
    free(p);
}
# file: demo.py
import ctypes as ct
import numpy as np

mod = ct.CDLL('./demo')
mod.fetch_buf.argtypes = (ct.POINTER(ct.c_uint32),)
mod.fetch_buf.restype = ct.POINTER(ct.c_uint8)
mod.release_buf.argtypes = (ct.POINTER(ct.c_uint8),)
mod.release_buf.restype = None

count = ct.c_uint32(0)
ptr = mod.fetch_buf(count)

# 1) Copy to bytes using ctypes.string_at
as_bytes = ct.string_at(ptr, count.value)
print(as_bytes)  # bytes

# 2) Copy to Python list by slicing the pointer
as_list = ptr[:count.value]
print(as_list)  # list of int

# 3) Share the buffer with NumPy (no copy)
as_np = np.ctypeslib.as_array(ptr, shape=(count.value,))
print(as_np)  # numpy array
as_np[0] = 7  # modify via numpy...
print(as_np)  # numpy array
print(ptr[0])  # ...modifies original buffer as well

mod.release_buf(ptr)
b'\x00\x01\x02\x03\x04'
[0, 1, 2, 3, 4]
[0 1 2 3 4]
[7 1 2 3 4]
7

Why this is important

Interop performance and memory safety depend on these details. Choosing between a copy and a shared buffer affects both speed and lifecycle management. If you need high throughput and plan to process the data in place, the NumPy shared-buffer approach avoids extra allocations and data movement. If you just need an immutable snapshot, bytes is a clean, self-contained copy. In either case, pairing the pointer with its size and freeing at the right time prevents leaks and use-after-free bugs.

Takeaways

Always carry explicit length across the boundary; a pointer alone isn’t enough. For a direct NumPy workflow, wrap with np.ctypeslib.as_array and specify shape, then delay freeing until you’re done. For self-contained data, use ctypes.string_at to build bytes or slice to a Python list and free immediately afterward. With these patterns, you avoid slow Python loops and keep ownership and performance under control.

The article is based on a question from StackOverflow by user21391767 and an answer by Mark Tolonen.